ayushexel's picture
Add new SentenceTransformer model
7300a24 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:44283
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
  - source_sentence: What do detritivores eat?
    sentences:
      - >-
        Many organisms (of which humans are prime examples) eat from multiple
        levels of the food chain and, thus, make this classification
        problematic. A carnivore may eat both secondary and tertiary consumers,
        and its prey may itself be difficult to classify for similar reasons.
        Organisms showing both carnivory and herbivory are known as omnivores.
        Even herbivores such as the giant panda may supplement their diet with
        meat. Scavenging of carrion provides a significant part of the diet of
        some of the most fearsome predators. Carnivorous plants would be very
        difficult to fit into this classification, producing their own food but
        also digesting anything that they may trap. Organisms that eat
        detritivores or parasites would also be difficult to classify by such a
        scheme.
      - >-
        In an ecosystem, predation is a biological interaction where a predator
        (an organism that is hunting) feeds on its prey (the organism that is
        attacked). Predators may or may not kill their prey prior to feeding on
        them, but the act of predation often results in the death of the prey
        and the eventual absorption of the prey's tissue through consumption.
        Thus predation is often, though not always, carnivory. Other categories
        of consumption are herbivory (eating parts of plants), fungivory (eating
        parts of fungi), and detritivory (the consumption of dead organic
        material (detritus)). All these consumption categories fall under the
        rubric of consumer-resource systems. It can often be difficult to
        separate various types of feeding behaviors. For example, some parasitic
        species prey on a host organism and then lay their eggs on it for their
        offspring to feed on it while it continues to live in or on its decaying
        corpse after it has died. The key characteristic of predation however is
        the predator's direct impact on the prey population. On the other hand,
        detritivores simply eat dead organic material arising from the decay of
        dead individuals and have no direct impact on the "donor" organism(s).
      - >-
        Buddhism is practiced by an estimated 488 million,[web 1] 495 million,
        or 535 million people as of the 2010s, representing 7% to 8% of the
        world's total population.
  - source_sentence: >-
      Along with Georgia, what American jurisdiction allowed people to be
      executed for the rape of an adult prior to Coker?
    sentences:
      - >-
        The engagement was not without controversy: Philip had no financial
        standing, was foreign-born (though a British subject who had served in
        the Royal Navy throughout the Second World War), and had sisters who had
        married German noblemen with Nazi links. Marion Crawford wrote, "Some of
        the King's advisors did not think him good enough for her. He was a
        prince without a home or kingdom. Some of the papers played long and
        loud tunes on the string of Philip's foreign origin." Elizabeth's mother
        was reported, in later biographies, to have opposed the union initially,
        even dubbing Philip "The Hun". In later life, however, she told
        biographer Tim Heald that Philip was "an English gentleman".
      - >-
        In 1977, the Supreme Court's Coker v. Georgia decision barred the death
        penalty for rape of an adult woman, and implied that the death penalty
        was inappropriate for any offense against another person other than
        murder. Prior to the decision, the death penalty for rape of an adult
        had been gradually phased out in the United States, and at the time of
        the decision, the State of Georgia and the U.S. Federal government were
        the only two jurisdictions to still retain the death penalty for that
        offense. However, three states maintained the death penalty for child
        rape, as the Coker decision only imposed a ban on executions for the
        rape of an adult woman. In 2008, the Kennedy v. Louisiana decision
        barred the death penalty for child rape. The result of these two
        decisions means that the death penalty in the United States is largely
        restricted to cases where the defendant took the life of another human
        being. The current federal kidnapping statute, however, may be exempt
        because the death penalty applies if the victim dies in the
        perpetrator's custody, not necessarily by his hand, thus stipulating a
        resulting death, which was the wording of the objection. In addition,
        the Federal government retains the death penalty for non-murder offenses
        that are considered crimes against the state, including treason,
        espionage, and crimes under military jurisdiction.
      - >-
        In 1977, the Supreme Court's Coker v. Georgia decision barred the death
        penalty for rape of an adult woman, and implied that the death penalty
        was inappropriate for any offense against another person other than
        murder. Prior to the decision, the death penalty for rape of an adult
        had been gradually phased out in the United States, and at the time of
        the decision, the State of Georgia and the U.S. Federal government were
        the only two jurisdictions to still retain the death penalty for that
        offense. However, three states maintained the death penalty for child
        rape, as the Coker decision only imposed a ban on executions for the
        rape of an adult woman. In 2008, the Kennedy v. Louisiana decision
        barred the death penalty for child rape. The result of these two
        decisions means that the death penalty in the United States is largely
        restricted to cases where the defendant took the life of another human
        being. The current federal kidnapping statute, however, may be exempt
        because the death penalty applies if the victim dies in the
        perpetrator's custody, not necessarily by his hand, thus stipulating a
        resulting death, which was the wording of the objection. In addition,
        the Federal government retains the death penalty for non-murder offenses
        that are considered crimes against the state, including treason,
        espionage, and crimes under military jurisdiction.
  - source_sentence: Testing revealed today's dogs trace back by how many years?
    sentences:
      - >-
        The Schnellweg (en: expressway) system, a number of Bundesstraße roads,
        forms a structure loosely resembling a large ring road together with A2
        and A7. The roads are B 3, B 6 and B 65, called Westschnellweg (B6 on
        the northern part, B3 on the southern part), Messeschnellweg (B3,
        becomes A37 near Burgdorf, crosses A2, becomes B3 again, changes to B6
        at Seelhorster Kreuz, then passes the Hanover fairground as B6 and
        becomes A37 again before merging into A7) and Südschnellweg (starts out
        as B65, becomes B3/B6/B65 upon crossing Westschnellweg, then becomes B65
        again at Seelhorster Kreuz).
      - >-
        Although initially thought to have originated as a manmade variant of an
        extant canid species (variously supposed as being the dhole, golden
        jackal, or gray wolf), extensive genetic studies undertaken during the
        2010s indicate that dogs diverged from an extinct wolf-like canid in
        Eurasia 40,000 years ago. Being the oldest domesticated animal, their
        long association with people has allowed dogs to be uniquely attuned to
        human behavior, as well as thrive on a starch-rich diet which would be
        inadequate for other canid species.
      - >-
        Although it is said that the "dog is man's best friend" regarding 17–24%
        of dogs in the developed countries, in the developing world they are
        feral, village or community dogs, with pet dogs uncommon. These live
        their lives as scavengers and have never been owned by humans, with one
        study showing their most common response when approached by strangers
        was to run away (52%) or respond with aggression (11%). We know little
        about these dogs, nor about the dogs that live in developed countries
        that are feral, stray or are in shelters, yet the great majority of
        modern research on dog cognition has focused on pet dogs living in human
        homes.
  - source_sentence: What kind of societies usually follow a regular daily schedule year-round?
    sentences:
      - >-
        Industrialized societies generally follow a clock-based schedule for
        daily activities that do not change throughout the course of the year.
        The time of day that individuals begin and end work or school, and the
        coordination of mass transit, for example, usually remain constant
        year-round. In contrast, an agrarian society's daily routines for work
        and personal conduct are more likely governed by the length of daylight
        hours and by solar time, which change seasonally because of the Earth's
        axial tilt. North and south of the tropics daylight lasts longer in
        summer and shorter in winter, the effect becoming greater as one moves
        away from the tropics.
      - >-
        Relatively insensitive film, with a correspondingly lower speed index,
        requires more exposure to light to produce the same image density as a
        more sensitive film, and is thus commonly termed a slow film. Highly
        sensitive films are correspondingly termed fast films. In both digital
        and film photography, the reduction of exposure corresponding to use of
        higher sensitivities generally leads to reduced image quality (via
        coarser film grain or higher image noise of other types). In short, the
        higher the sensitivity, the grainier the image will be. Ultimately
        sensitivity is limited by the quantum efficiency of the film or sensor.
      - >-
        Industrialized societies generally follow a clock-based schedule for
        daily activities that do not change throughout the course of the year.
        The time of day that individuals begin and end work or school, and the
        coordination of mass transit, for example, usually remain constant
        year-round. In contrast, an agrarian society's daily routines for work
        and personal conduct are more likely governed by the length of daylight
        hours and by solar time, which change seasonally because of the Earth's
        axial tilt. North and south of the tropics daylight lasts longer in
        summer and shorter in winter, the effect becoming greater as one moves
        away from the tropics.
  - source_sentence: How many people outside the UK were under British rule in 1945?
    sentences:
      - >-
        20th Street starts at Avenue C, and 21st and 22nd Streets begin at First
        Avenue. They all end at Eleventh Avenue. Travel on the last block of the
        20th, 21st and 22nd Streets, between Tenth and Eleventh Avenues, is in
        the opposite direction than it is on the rest of the respective street.
        20th Street is very wide from the Avenue C to First Avenue.
      - >-
        By the start of the 20th century, Germany and the United States
        challenged Britain's economic lead. Subsequent military and economic
        tensions between Britain and Germany were major causes of the First
        World War, during which Britain relied heavily upon its empire. The
        conflict placed enormous strain on the military, financial and manpower
        resources of Britain. Although the British Empire achieved its largest
        territorial extent immediately after World War I, Britain was no longer
        the world's pre-eminent industrial or military power. In the Second
        World War, Britain's colonies in South-East Asia were occupied by
        Imperial Japan. Despite the final victory of Britain and its allies, the
        damage to British prestige helped to accelerate the decline of the
        empire. India, Britain's most valuable and populous possession, achieved
        independence as part of a larger decolonisation movement in which
        Britain granted independence to most territories of the Empire. The
        transfer of Hong Kong to China in 1997 marked for many the end of the
        British Empire. Fourteen overseas territories remain under British
        sovereignty. After independence, many former British colonies joined the
        Commonwealth of Nations, a free association of independent states. The
        United Kingdom is now one of 16 Commonwealth nations, a grouping known
        informally as the Commonwealth realms, that share one monarch—Queen
        Elizabeth II.
      - >-
        Though Britain and the empire emerged victorious from the Second World
        War, the effects of the conflict were profound, both at home and abroad.
        Much of Europe, a continent that had dominated the world for several
        centuries, was in ruins, and host to the armies of the United States and
        the Soviet Union, who now held the balance of global power. Britain was
        left essentially bankrupt, with insolvency only averted in 1946 after
        the negotiation of a $US 4.33 billion loan (US$56 billion in 2012) from
        the United States, the last instalment of which was repaid in 2006. At
        the same time, anti-colonial movements were on the rise in the colonies
        of European nations. The situation was complicated further by the
        increasing Cold War rivalry of the United States and the Soviet Union.
        In principle, both nations were opposed to European colonialism. In
        practice, however, American anti-communism prevailed over
        anti-imperialism, and therefore the United States supported the
        continued existence of the British Empire to keep Communist expansion in
        check. The "wind of change" ultimately meant that the British Empire's
        days were numbered, and on the whole, Britain adopted a policy of
        peaceful disengagement from its colonies once stable, non-Communist
        governments were available to transfer power to. This was in contrast to
        other European powers such as France and Portugal, which waged costly
        and ultimately unsuccessful wars to keep their empires intact. Between
        1945 and 1965, the number of people under British rule outside the UK
        itself fell from 700 million to five million, three million of whom were
        in Hong Kong.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on BAAI/bge-base-en-v1.5
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: gooqa dev
          type: gooqa-dev
        metrics:
          - type: cosine_accuracy
            value: 0.41920000314712524
            name: Cosine Accuracy

SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-3-epochs")
# Run inference
sentences = [
    'How many people outside the UK were under British rule in 1945?',
    'Though Britain and the empire emerged victorious from the Second World War, the effects of the conflict were profound, both at home and abroad. Much of Europe, a continent that had dominated the world for several centuries, was in ruins, and host to the armies of the United States and the Soviet Union, who now held the balance of global power. Britain was left essentially bankrupt, with insolvency only averted in 1946 after the negotiation of a $US 4.33 billion loan (US$56 billion in 2012) from the United States, the last instalment of which was repaid in 2006. At the same time, anti-colonial movements were on the rise in the colonies of European nations. The situation was complicated further by the increasing Cold War rivalry of the United States and the Soviet Union. In principle, both nations were opposed to European colonialism. In practice, however, American anti-communism prevailed over anti-imperialism, and therefore the United States supported the continued existence of the British Empire to keep Communist expansion in check. The "wind of change" ultimately meant that the British Empire\'s days were numbered, and on the whole, Britain adopted a policy of peaceful disengagement from its colonies once stable, non-Communist governments were available to transfer power to. This was in contrast to other European powers such as France and Portugal, which waged costly and ultimately unsuccessful wars to keep their empires intact. Between 1945 and 1965, the number of people under British rule outside the UK itself fell from 700 million to five million, three million of whom were in Hong Kong.',
    "By the start of the 20th century, Germany and the United States challenged Britain's economic lead. Subsequent military and economic tensions between Britain and Germany were major causes of the First World War, during which Britain relied heavily upon its empire. The conflict placed enormous strain on the military, financial and manpower resources of Britain. Although the British Empire achieved its largest territorial extent immediately after World War I, Britain was no longer the world's pre-eminent industrial or military power. In the Second World War, Britain's colonies in South-East Asia were occupied by Imperial Japan. Despite the final victory of Britain and its allies, the damage to British prestige helped to accelerate the decline of the empire. India, Britain's most valuable and populous possession, achieved independence as part of a larger decolonisation movement in which Britain granted independence to most territories of the Empire. The transfer of Hong Kong to China in 1997 marked for many the end of the British Empire. Fourteen overseas territories remain under British sovereignty. After independence, many former British colonies joined the Commonwealth of Nations, a free association of independent states. The United Kingdom is now one of 16 Commonwealth nations, a grouping known informally as the Commonwealth realms, that share one monarch—Queen Elizabeth II.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4192

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,283 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 6 tokens
    • mean: 14.62 tokens
    • max: 35 tokens
    • min: 31 tokens
    • mean: 152.26 tokens
    • max: 512 tokens
    • min: 32 tokens
    • mean: 153.89 tokens
    • max: 510 tokens
  • Samples:
    question context negative
    About how many cubic meters of wood was used in 1991 to make products like glulam, LVL, and structural composite lumber? These products include glued laminated timber (glulam), wood structural panels (including plywood, oriented strand board and composite panels), laminated veneer lumber (LVL) and other structural composite lumber (SCL) products, parallel strand lumber, and I-joists. Approximately 100 million cubic meters of wood was consumed for this purpose in 1991. The trends suggest that particle board and fiber board will overtake plywood. Engineered wood products, glued building products "engineered" for application-specific performance requirements, are often used in construction and industrial applications. Glued engineered wood products are manufactured by bonding together wood strands, veneers, lumber or other forms of wood fiber with glue to form a larger, more efficient composite structural unit.
    Which Eclogues discusses homosexual love? The biographical tradition asserts that Virgil began the hexameter Eclogues (or Bucolics) in 42 BC and it is thought that the collection was published around 39–38 BC, although this is controversial. The Eclogues (from the Greek for "selections") are a group of ten poems roughly modeled on the bucolic hexameter poetry ("pastoral poetry") of the Hellenistic poet Theocritus. After his victory in the Battle of Philippi in 42 BC, fought against the army led by the assassins of Julius Caesar, Octavian tried to pay off his veterans with land expropriated from towns in northern Italy, supposedly including, according to the tradition, an estate near Mantua belonging to Virgil. The loss of his family farm and the attempt through poetic petitions to regain his property have traditionally been seen as Virgil's motives in the composition of the Eclogues. This is now thought to be an unsupported inference from interpretations of the Eclogues. In Eclogues 1 and 9, Virgil indeed dramatizes the contra... Sometime after the publication of the Eclogues (probably before 37 BC), Virgil became part of the circle of Maecenas, Octavian's capable agent d'affaires who sought to counter sympathy for Antony among the leading families by rallying Roman literary figures to Octavian's side. Virgil came to know many of the other leading literary figures of the time, including Horace, in whose poetry he is often mentioned, and Varius Rufus, who later helped finish the Aeneid.
    What nation lies to the west of the Marshall Islands? The islands are located about halfway between Hawaii and Australia, north of Nauru and Kiribati, east of the Federated States of Micronesia, and south of the U.S. territory of Wake Island, to which it lays claim. The atolls and islands form two groups: the Ratak (sunrise) and the Ralik (sunset). The two island chains lie approximately parallel to one another, running northwest to southeast, comprising about 750,000 square miles (1,900,000 km2) of ocean but only about 70 square miles (180 km2) of land mass. Each includes 15 to 18 islands and atolls. The country consists of a total of 29 atolls and five isolated islands. The climate is hot and humid, with a wet season from May to November. Many Pacific typhoons begin as tropical storms in the Marshall Islands region, and grow stronger as they move west toward the Mariana Islands and the Philippines.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 3 tokens
    • mean: 14.61 tokens
    • max: 52 tokens
    • min: 28 tokens
    • mean: 151.48 tokens
    • max: 494 tokens
    • min: 28 tokens
    • mean: 151.7 tokens
    • max: 512 tokens
  • Samples:
    question context negative_1
    When did Simon Cowell announce he was no longer going to be a judge? In season eight, Latin Grammy Award-nominated singer–songwriter and record producer Kara DioGuardi was added as a fourth judge. She stayed for two seasons and left the show before season ten. Paula Abdul left the show before season nine after failing to agree terms with the show producers. Emmy Award-winning talk show host Ellen DeGeneres replaced Paula Abdul for that season, but left after just one season. On January 11, 2010, Simon Cowell announced that he was leaving the show to pursue introducing the American version of his show The X Factor to the USA for 2011. Jennifer Lopez and Steven Tyler joined the judging panel in season ten, but both left after two seasons. They were replaced by three new judges, Mariah Carey, Nicki Minaj and Keith Urban, who joined Randy Jackson in season 12. However both Carey and Minaj left after one season, and Randy Jackson also announced that he would depart the show after twelve seasons as a judge but would return as a mentor. Urban is the only judge... A special tribute to Simon Cowell was presented in the finale for his final season with the show. Many figures from the show's past, including Paula Abdul, made an appearance.
    When did Hodgson publish his DNA study? According to an autosomal DNA study by Hodgson et al. (2014), the Afro-Asiatic languages were likely spread across Africa and the Near East by an ancestral population(s) carrying a newly identified non-African genetic component, which the researchers dub the "Ethio-Somali". This Ethio-Somali component is today most common among Afro-Asiatic-speaking populations in the Horn of Africa. It reaches a frequency peak among ethnic Somalis, representing the majority of their ancestry. The Ethio-Somali component is most closely related to the Maghrebi non-African genetic component, and is believed to have diverged from all other non-African ancestries at least 23,000 years ago. On this basis, the researchers suggest that the original Ethio-Somali carrying population(s) probably arrived in the pre-agricultural period from the Near East, having crossed over into northeastern Africa via the Sinai Peninsula. The population then likely split into two branches, with one group heading westward toward ... Advances in understanding genes and inheritance continued throughout the 20th century. Deoxyribonucleic acid (DNA) was shown to be the molecular repository of genetic information by experiments in the 1940s to 1950s. The structure of DNA was studied by Rosalind Franklin using X-ray crystallography, which led James D. Watson and Francis Crick to publish a model of the double-stranded DNA molecule whose paired nucleotide bases indicated a compelling hypothesis for the mechanism of genetic replication. Collectively, this body of research established the central dogma of molecular biology, which states that proteins are translated from RNA, which is transcribed from DNA. This dogma has since been shown to have exceptions, such as reverse transcription in retroviruses. The modern study of genetics at the level of DNA is known as molecular genetics.
    What campus did Yale buy in 2008? Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven forests in Connecticut, Vermont, and New Hampshire—the largest of which is the 7,840-acre (31.7 km2) Yale-Myers Forest in Connecticut's Quiet Corner—and nature preserves including Horse Island. Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven forests in Connecticut, Vermont, and New Hampshire—the largest of which is the 7,840-acre (31.7 km2) Yale-Myers Forest in Connecticut's Quiet Corner—and nature preserves including Horse Island.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3536
0.2890 100 0.6663 0.7830 0.3922
0.5780 200 0.4374 0.7399 0.4064
0.8671 300 0.4048 0.7294 0.4086
1.1561 400 0.3149 0.7244 0.4136
1.4451 500 0.2378 0.7246 0.4182
1.7341 600 0.2358 0.7179 0.4158
2.0231 700 0.2338 0.7170 0.4240
2.3121 800 0.1602 0.7293 0.4148
2.6012 900 0.1595 0.7237 0.4230
2.8902 1000 0.1545 0.7229 0.4146
-1 -1 - - 0.4192

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}