--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:44283 - loss:MultipleNegativesRankingLoss base_model: BAAI/bge-base-en-v1.5 widget: - source_sentence: What do detritivores eat? sentences: - Many organisms (of which humans are prime examples) eat from multiple levels of the food chain and, thus, make this classification problematic. A carnivore may eat both secondary and tertiary consumers, and its prey may itself be difficult to classify for similar reasons. Organisms showing both carnivory and herbivory are known as omnivores. Even herbivores such as the giant panda may supplement their diet with meat. Scavenging of carrion provides a significant part of the diet of some of the most fearsome predators. Carnivorous plants would be very difficult to fit into this classification, producing their own food but also digesting anything that they may trap. Organisms that eat detritivores or parasites would also be difficult to classify by such a scheme. - In an ecosystem, predation is a biological interaction where a predator (an organism that is hunting) feeds on its prey (the organism that is attacked). Predators may or may not kill their prey prior to feeding on them, but the act of predation often results in the death of the prey and the eventual absorption of the prey's tissue through consumption. Thus predation is often, though not always, carnivory. Other categories of consumption are herbivory (eating parts of plants), fungivory (eating parts of fungi), and detritivory (the consumption of dead organic material (detritus)). All these consumption categories fall under the rubric of consumer-resource systems. It can often be difficult to separate various types of feeding behaviors. For example, some parasitic species prey on a host organism and then lay their eggs on it for their offspring to feed on it while it continues to live in or on its decaying corpse after it has died. The key characteristic of predation however is the predator's direct impact on the prey population. On the other hand, detritivores simply eat dead organic material arising from the decay of dead individuals and have no direct impact on the "donor" organism(s). - Buddhism is practiced by an estimated 488 million,[web 1] 495 million, or 535 million people as of the 2010s, representing 7% to 8% of the world's total population. - source_sentence: Along with Georgia, what American jurisdiction allowed people to be executed for the rape of an adult prior to Coker? sentences: - 'The engagement was not without controversy: Philip had no financial standing, was foreign-born (though a British subject who had served in the Royal Navy throughout the Second World War), and had sisters who had married German noblemen with Nazi links. Marion Crawford wrote, "Some of the King''s advisors did not think him good enough for her. He was a prince without a home or kingdom. Some of the papers played long and loud tunes on the string of Philip''s foreign origin." Elizabeth''s mother was reported, in later biographies, to have opposed the union initially, even dubbing Philip "The Hun". In later life, however, she told biographer Tim Heald that Philip was "an English gentleman".' - In 1977, the Supreme Court's Coker v. Georgia decision barred the death penalty for rape of an adult woman, and implied that the death penalty was inappropriate for any offense against another person other than murder. Prior to the decision, the death penalty for rape of an adult had been gradually phased out in the United States, and at the time of the decision, the State of Georgia and the U.S. Federal government were the only two jurisdictions to still retain the death penalty for that offense. However, three states maintained the death penalty for child rape, as the Coker decision only imposed a ban on executions for the rape of an adult woman. In 2008, the Kennedy v. Louisiana decision barred the death penalty for child rape. The result of these two decisions means that the death penalty in the United States is largely restricted to cases where the defendant took the life of another human being. The current federal kidnapping statute, however, may be exempt because the death penalty applies if the victim dies in the perpetrator's custody, not necessarily by his hand, thus stipulating a resulting death, which was the wording of the objection. In addition, the Federal government retains the death penalty for non-murder offenses that are considered crimes against the state, including treason, espionage, and crimes under military jurisdiction. - In 1977, the Supreme Court's Coker v. Georgia decision barred the death penalty for rape of an adult woman, and implied that the death penalty was inappropriate for any offense against another person other than murder. Prior to the decision, the death penalty for rape of an adult had been gradually phased out in the United States, and at the time of the decision, the State of Georgia and the U.S. Federal government were the only two jurisdictions to still retain the death penalty for that offense. However, three states maintained the death penalty for child rape, as the Coker decision only imposed a ban on executions for the rape of an adult woman. In 2008, the Kennedy v. Louisiana decision barred the death penalty for child rape. The result of these two decisions means that the death penalty in the United States is largely restricted to cases where the defendant took the life of another human being. The current federal kidnapping statute, however, may be exempt because the death penalty applies if the victim dies in the perpetrator's custody, not necessarily by his hand, thus stipulating a resulting death, which was the wording of the objection. In addition, the Federal government retains the death penalty for non-murder offenses that are considered crimes against the state, including treason, espionage, and crimes under military jurisdiction. - source_sentence: Testing revealed today's dogs trace back by how many years? sentences: - 'The Schnellweg (en: expressway) system, a number of Bundesstraße roads, forms a structure loosely resembling a large ring road together with A2 and A7. The roads are B 3, B 6 and B 65, called Westschnellweg (B6 on the northern part, B3 on the southern part), Messeschnellweg (B3, becomes A37 near Burgdorf, crosses A2, becomes B3 again, changes to B6 at Seelhorster Kreuz, then passes the Hanover fairground as B6 and becomes A37 again before merging into A7) and Südschnellweg (starts out as B65, becomes B3/B6/B65 upon crossing Westschnellweg, then becomes B65 again at Seelhorster Kreuz).' - Although initially thought to have originated as a manmade variant of an extant canid species (variously supposed as being the dhole, golden jackal, or gray wolf), extensive genetic studies undertaken during the 2010s indicate that dogs diverged from an extinct wolf-like canid in Eurasia 40,000 years ago. Being the oldest domesticated animal, their long association with people has allowed dogs to be uniquely attuned to human behavior, as well as thrive on a starch-rich diet which would be inadequate for other canid species. - Although it is said that the "dog is man's best friend" regarding 17–24% of dogs in the developed countries, in the developing world they are feral, village or community dogs, with pet dogs uncommon. These live their lives as scavengers and have never been owned by humans, with one study showing their most common response when approached by strangers was to run away (52%) or respond with aggression (11%). We know little about these dogs, nor about the dogs that live in developed countries that are feral, stray or are in shelters, yet the great majority of modern research on dog cognition has focused on pet dogs living in human homes. - source_sentence: What kind of societies usually follow a regular daily schedule year-round? sentences: - Industrialized societies generally follow a clock-based schedule for daily activities that do not change throughout the course of the year. The time of day that individuals begin and end work or school, and the coordination of mass transit, for example, usually remain constant year-round. In contrast, an agrarian society's daily routines for work and personal conduct are more likely governed by the length of daylight hours and by solar time, which change seasonally because of the Earth's axial tilt. North and south of the tropics daylight lasts longer in summer and shorter in winter, the effect becoming greater as one moves away from the tropics. - Relatively insensitive film, with a correspondingly lower speed index, requires more exposure to light to produce the same image density as a more sensitive film, and is thus commonly termed a slow film. Highly sensitive films are correspondingly termed fast films. In both digital and film photography, the reduction of exposure corresponding to use of higher sensitivities generally leads to reduced image quality (via coarser film grain or higher image noise of other types). In short, the higher the sensitivity, the grainier the image will be. Ultimately sensitivity is limited by the quantum efficiency of the film or sensor. - Industrialized societies generally follow a clock-based schedule for daily activities that do not change throughout the course of the year. The time of day that individuals begin and end work or school, and the coordination of mass transit, for example, usually remain constant year-round. In contrast, an agrarian society's daily routines for work and personal conduct are more likely governed by the length of daylight hours and by solar time, which change seasonally because of the Earth's axial tilt. North and south of the tropics daylight lasts longer in summer and shorter in winter, the effect becoming greater as one moves away from the tropics. - source_sentence: How many people outside the UK were under British rule in 1945? sentences: - 20th Street starts at Avenue C, and 21st and 22nd Streets begin at First Avenue. They all end at Eleventh Avenue. Travel on the last block of the 20th, 21st and 22nd Streets, between Tenth and Eleventh Avenues, is in the opposite direction than it is on the rest of the respective street. 20th Street is very wide from the Avenue C to First Avenue. - By the start of the 20th century, Germany and the United States challenged Britain's economic lead. Subsequent military and economic tensions between Britain and Germany were major causes of the First World War, during which Britain relied heavily upon its empire. The conflict placed enormous strain on the military, financial and manpower resources of Britain. Although the British Empire achieved its largest territorial extent immediately after World War I, Britain was no longer the world's pre-eminent industrial or military power. In the Second World War, Britain's colonies in South-East Asia were occupied by Imperial Japan. Despite the final victory of Britain and its allies, the damage to British prestige helped to accelerate the decline of the empire. India, Britain's most valuable and populous possession, achieved independence as part of a larger decolonisation movement in which Britain granted independence to most territories of the Empire. The transfer of Hong Kong to China in 1997 marked for many the end of the British Empire. Fourteen overseas territories remain under British sovereignty. After independence, many former British colonies joined the Commonwealth of Nations, a free association of independent states. The United Kingdom is now one of 16 Commonwealth nations, a grouping known informally as the Commonwealth realms, that share one monarch—Queen Elizabeth II. - Though Britain and the empire emerged victorious from the Second World War, the effects of the conflict were profound, both at home and abroad. Much of Europe, a continent that had dominated the world for several centuries, was in ruins, and host to the armies of the United States and the Soviet Union, who now held the balance of global power. Britain was left essentially bankrupt, with insolvency only averted in 1946 after the negotiation of a $US 4.33 billion loan (US$56 billion in 2012) from the United States, the last instalment of which was repaid in 2006. At the same time, anti-colonial movements were on the rise in the colonies of European nations. The situation was complicated further by the increasing Cold War rivalry of the United States and the Soviet Union. In principle, both nations were opposed to European colonialism. In practice, however, American anti-communism prevailed over anti-imperialism, and therefore the United States supported the continued existence of the British Empire to keep Communist expansion in check. The "wind of change" ultimately meant that the British Empire's days were numbered, and on the whole, Britain adopted a policy of peaceful disengagement from its colonies once stable, non-Communist governments were available to transfer power to. This was in contrast to other European powers such as France and Portugal, which waged costly and ultimately unsuccessful wars to keep their empires intact. Between 1945 and 1965, the number of people under British rule outside the UK itself fell from 700 million to five million, three million of whom were in Hong Kong. pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy model-index: - name: SentenceTransformer based on BAAI/bge-base-en-v1.5 results: - task: type: triplet name: Triplet dataset: name: gooqa dev type: gooqa-dev metrics: - type: cosine_accuracy value: 0.41920000314712524 name: Cosine Accuracy --- # SentenceTransformer based on BAAI/bge-base-en-v1.5 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-3-epochs") # Run inference sentences = [ 'How many people outside the UK were under British rule in 1945?', 'Though Britain and the empire emerged victorious from the Second World War, the effects of the conflict were profound, both at home and abroad. Much of Europe, a continent that had dominated the world for several centuries, was in ruins, and host to the armies of the United States and the Soviet Union, who now held the balance of global power. Britain was left essentially bankrupt, with insolvency only averted in 1946 after the negotiation of a $US 4.33 billion loan (US$56 billion in 2012) from the United States, the last instalment of which was repaid in 2006. At the same time, anti-colonial movements were on the rise in the colonies of European nations. The situation was complicated further by the increasing Cold War rivalry of the United States and the Soviet Union. In principle, both nations were opposed to European colonialism. In practice, however, American anti-communism prevailed over anti-imperialism, and therefore the United States supported the continued existence of the British Empire to keep Communist expansion in check. The "wind of change" ultimately meant that the British Empire\'s days were numbered, and on the whole, Britain adopted a policy of peaceful disengagement from its colonies once stable, non-Communist governments were available to transfer power to. This was in contrast to other European powers such as France and Portugal, which waged costly and ultimately unsuccessful wars to keep their empires intact. Between 1945 and 1965, the number of people under British rule outside the UK itself fell from 700 million to five million, three million of whom were in Hong Kong.', "By the start of the 20th century, Germany and the United States challenged Britain's economic lead. Subsequent military and economic tensions between Britain and Germany were major causes of the First World War, during which Britain relied heavily upon its empire. The conflict placed enormous strain on the military, financial and manpower resources of Britain. Although the British Empire achieved its largest territorial extent immediately after World War I, Britain was no longer the world's pre-eminent industrial or military power. In the Second World War, Britain's colonies in South-East Asia were occupied by Imperial Japan. Despite the final victory of Britain and its allies, the damage to British prestige helped to accelerate the decline of the empire. India, Britain's most valuable and populous possession, achieved independence as part of a larger decolonisation movement in which Britain granted independence to most territories of the Empire. The transfer of Hong Kong to China in 1997 marked for many the end of the British Empire. Fourteen overseas territories remain under British sovereignty. After independence, many former British colonies joined the Commonwealth of Nations, a free association of independent states. The United Kingdom is now one of 16 Commonwealth nations, a grouping known informally as the Commonwealth realms, that share one monarch—Queen Elizabeth II.", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Triplet * Dataset: `gooqa-dev` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:--------------------|:-----------| | **cosine_accuracy** | **0.4192** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 44,283 training samples * Columns: question, context, and negative * Approximate statistics based on the first 1000 samples: | | question | context | negative | |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | question | context | negative | |:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | About how many cubic meters of wood was used in 1991 to make products like glulam, LVL, and structural composite lumber? | These products include glued laminated timber (glulam), wood structural panels (including plywood, oriented strand board and composite panels), laminated veneer lumber (LVL) and other structural composite lumber (SCL) products, parallel strand lumber, and I-joists. Approximately 100 million cubic meters of wood was consumed for this purpose in 1991. The trends suggest that particle board and fiber board will overtake plywood. | Engineered wood products, glued building products "engineered" for application-specific performance requirements, are often used in construction and industrial applications. Glued engineered wood products are manufactured by bonding together wood strands, veneers, lumber or other forms of wood fiber with glue to form a larger, more efficient composite structural unit. | | Which Eclogues discusses homosexual love? | The biographical tradition asserts that Virgil began the hexameter Eclogues (or Bucolics) in 42 BC and it is thought that the collection was published around 39–38 BC, although this is controversial. The Eclogues (from the Greek for "selections") are a group of ten poems roughly modeled on the bucolic hexameter poetry ("pastoral poetry") of the Hellenistic poet Theocritus. After his victory in the Battle of Philippi in 42 BC, fought against the army led by the assassins of Julius Caesar, Octavian tried to pay off his veterans with land expropriated from towns in northern Italy, supposedly including, according to the tradition, an estate near Mantua belonging to Virgil. The loss of his family farm and the attempt through poetic petitions to regain his property have traditionally been seen as Virgil's motives in the composition of the Eclogues. This is now thought to be an unsupported inference from interpretations of the Eclogues. In Eclogues 1 and 9, Virgil indeed dramatizes the contra... | Sometime after the publication of the Eclogues (probably before 37 BC), Virgil became part of the circle of Maecenas, Octavian's capable agent d'affaires who sought to counter sympathy for Antony among the leading families by rallying Roman literary figures to Octavian's side. Virgil came to know many of the other leading literary figures of the time, including Horace, in whose poetry he is often mentioned, and Varius Rufus, who later helped finish the Aeneid. | | What nation lies to the west of the Marshall Islands? | The islands are located about halfway between Hawaii and Australia, north of Nauru and Kiribati, east of the Federated States of Micronesia, and south of the U.S. territory of Wake Island, to which it lays claim. The atolls and islands form two groups: the Ratak (sunrise) and the Ralik (sunset). The two island chains lie approximately parallel to one another, running northwest to southeast, comprising about 750,000 square miles (1,900,000 km2) of ocean but only about 70 square miles (180 km2) of land mass. Each includes 15 to 18 islands and atolls. The country consists of a total of 29 atolls and five isolated islands. | The climate is hot and humid, with a wet season from May to November. Many Pacific typhoons begin as tropical storms in the Marshall Islands region, and grow stronger as they move west toward the Mariana Islands and the Philippines. | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Evaluation Dataset #### Unnamed Dataset * Size: 5,000 evaluation samples * Columns: question, context, and negative_1 * Approximate statistics based on the first 1000 samples: | | question | context | negative_1 | |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | question | context | negative_1 | |:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | When did Simon Cowell announce he was no longer going to be a judge? | In season eight, Latin Grammy Award-nominated singer–songwriter and record producer Kara DioGuardi was added as a fourth judge. She stayed for two seasons and left the show before season ten. Paula Abdul left the show before season nine after failing to agree terms with the show producers. Emmy Award-winning talk show host Ellen DeGeneres replaced Paula Abdul for that season, but left after just one season. On January 11, 2010, Simon Cowell announced that he was leaving the show to pursue introducing the American version of his show The X Factor to the USA for 2011. Jennifer Lopez and Steven Tyler joined the judging panel in season ten, but both left after two seasons. They were replaced by three new judges, Mariah Carey, Nicki Minaj and Keith Urban, who joined Randy Jackson in season 12. However both Carey and Minaj left after one season, and Randy Jackson also announced that he would depart the show after twelve seasons as a judge but would return as a mentor. Urban is the only judge... | A special tribute to Simon Cowell was presented in the finale for his final season with the show. Many figures from the show's past, including Paula Abdul, made an appearance. | | When did Hodgson publish his DNA study? | According to an autosomal DNA study by Hodgson et al. (2014), the Afro-Asiatic languages were likely spread across Africa and the Near East by an ancestral population(s) carrying a newly identified non-African genetic component, which the researchers dub the "Ethio-Somali". This Ethio-Somali component is today most common among Afro-Asiatic-speaking populations in the Horn of Africa. It reaches a frequency peak among ethnic Somalis, representing the majority of their ancestry. The Ethio-Somali component is most closely related to the Maghrebi non-African genetic component, and is believed to have diverged from all other non-African ancestries at least 23,000 years ago. On this basis, the researchers suggest that the original Ethio-Somali carrying population(s) probably arrived in the pre-agricultural period from the Near East, having crossed over into northeastern Africa via the Sinai Peninsula. The population then likely split into two branches, with one group heading westward toward ... | Advances in understanding genes and inheritance continued throughout the 20th century. Deoxyribonucleic acid (DNA) was shown to be the molecular repository of genetic information by experiments in the 1940s to 1950s. The structure of DNA was studied by Rosalind Franklin using X-ray crystallography, which led James D. Watson and Francis Crick to publish a model of the double-stranded DNA molecule whose paired nucleotide bases indicated a compelling hypothesis for the mechanism of genetic replication. Collectively, this body of research established the central dogma of molecular biology, which states that proteins are translated from RNA, which is transcribed from DNA. This dogma has since been shown to have exceptions, such as reverse transcription in retroviruses. The modern study of genetics at the level of DNA is known as molecular genetics. | | What campus did Yale buy in 2008? | Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven forests in Connecticut, Vermont, and New Hampshire—the largest of which is the 7,840-acre (31.7 km2) Yale-Myers Forest in Connecticut's Quiet Corner—and nature preserves including Horse Island. | Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven forests in Connecticut, Vermont, and New Hampshire—the largest of which is the 7,840-acre (31.7 km2) Yale-Myers Forest in Connecticut's Quiet Corner—and nature preserves including Horse Island. | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 128 - `per_device_eval_batch_size`: 128 - `warmup_ratio`: 0.1 - `fp16`: True - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 128 - `per_device_eval_batch_size`: 128 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 3 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `tp_size`: 0 - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | Validation Loss | gooqa-dev_cosine_accuracy | |:------:|:----:|:-------------:|:---------------:|:-------------------------:| | -1 | -1 | - | - | 0.3536 | | 0.2890 | 100 | 0.6663 | 0.7830 | 0.3922 | | 0.5780 | 200 | 0.4374 | 0.7399 | 0.4064 | | 0.8671 | 300 | 0.4048 | 0.7294 | 0.4086 | | 1.1561 | 400 | 0.3149 | 0.7244 | 0.4136 | | 1.4451 | 500 | 0.2378 | 0.7246 | 0.4182 | | 1.7341 | 600 | 0.2358 | 0.7179 | 0.4158 | | 2.0231 | 700 | 0.2338 | 0.7170 | 0.4240 | | 2.3121 | 800 | 0.1602 | 0.7293 | 0.4148 | | 2.6012 | 900 | 0.1595 | 0.7237 | 0.4230 | | 2.8902 | 1000 | 0.1545 | 0.7229 | 0.4146 | | -1 | -1 | - | - | 0.4192 | ### Framework Versions - Python: 3.11.0 - Sentence Transformers: 4.0.1 - Transformers: 4.50.3 - PyTorch: 2.6.0+cu124 - Accelerate: 1.5.2 - Datasets: 3.5.0 - Tokenizers: 0.21.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```