ayushexel's picture
Add new SentenceTransformer model
93d4a21 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:44282
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
  - source_sentence: Does strength come from the walls or cavities of wood cells?
    sentences:
      - >-
        In temperate softwoods there often is a marked difference between
        latewood and earlywood. The latewood will be denser than that formed
        early in the season. When examined under a microscope the cells of dense
        latewood are seen to be very thick-walled and with very small cell
        cavities, while those formed first in the season have thin walls and
        large cell cavities. The strength is in the walls, not the cavities.
        Hence the greater the proportion of latewood the greater the density and
        strength. In choosing a piece of pine where strength or stiffness is the
        important consideration, the principal thing to observe is the
        comparative amounts of earlywood and latewood. The width of ring is not
        nearly so important as the proportion and nature of the latewood in the
        ring.
      - >-
        In temperate softwoods there often is a marked difference between
        latewood and earlywood. The latewood will be denser than that formed
        early in the season. When examined under a microscope the cells of dense
        latewood are seen to be very thick-walled and with very small cell
        cavities, while those formed first in the season have thin walls and
        large cell cavities. The strength is in the walls, not the cavities.
        Hence the greater the proportion of latewood the greater the density and
        strength. In choosing a piece of pine where strength or stiffness is the
        important consideration, the principal thing to observe is the
        comparative amounts of earlywood and latewood. The width of ring is not
        nearly so important as the proportion and nature of the latewood in the
        ring.
      - >-
        All of these elements struck a chord with the older readers, such as
        college-aged adults, and they successfully gained in a way not seen
        before. In 1965, Spider-Man and the Hulk were both featured in Esquire
        magazine's list of 28 college campus heroes, alongside John F. Kennedy
        and Bob Dylan. In 2009 writer Geoff Boucher reflected that, "Superman
        and DC Comics instantly seemed like boring old Pat Boone; Marvel felt
        like The Beatles and the British Invasion. It was Kirby's artwork with
        its tension and psychedelia that made it perfect for the times—or was it
        Lee's bravado and melodrama, which was somehow insecure and brash at the
        same time?"
  - source_sentence: What was Brian May's zodiac sign?
    sentences:
      - >-
        After the decline of Aksum, the Eritrean highlands were under the domain
        of Bahr Negash ruled by the Bahr Negus. The area was then known as
        Ma'ikele Bahr ("between the seas/rivers," i.e. the land between the Red
        Sea and the Mereb river). It was later renamed under Emperor Zara Yaqob
        as the domain of the Bahr Negash, the Medri Bahri ("Sea land" in
        Tingrinya, although it included some areas like Shire on the other side
        of the Mereb, today in Ethiopia). With its capital at Debarwa, the
        state's main provinces were Hamasien, Serae and Akele Guzai.
      - >-
        In 1963, the teenage Brian May and his father custom-built his signature
        guitar Red Special, which was purposely designed to feedback. Sonic
        experimentation figured heavily in Queen's songs. A distinctive
        characteristic of Queen's music are the vocal harmonies which are
        usually composed of the voices of May, Mercury, and Taylor best heard on
        the studio albums A Night at the Opera and A Day at the Races. Some of
        the ground work for the development of this sound can be attributed to
        their former producer Roy Thomas Baker, and their engineer Mike Stone.
        Besides vocal harmonies, Queen were also known for multi-tracking voices
        to imitate the sound of a large choir through overdubs. For instance,
        according to Brian May, there are over 180 vocal overdubs in "Bohemian
        Rhapsody". The band's vocal structures have been compared with the Beach
        Boys, but May stated they were not "much of an influence".
      - >-
        Having attended art college, Mercury also designed Queen's logo, called
        the Queen crest, shortly before the release of the band's first album.
        The logo combines the zodiac signs of all four members: two lions for
        Leo (Deacon and Taylor), a crab for Cancer (May), and two fairies for
        Virgo (Mercury). The lions embrace a stylised letter Q, the crab rests
        atop the letter with flames rising directly above it, and the fairies
        are each sheltering below a lion. There is also a crown inside the Q and
        the whole logo is over-shadowed by an enormous phoenix. The whole symbol
        bears a passing resemblance to the Royal coat of arms of the United
        Kingdom, particularly with the lion supporters. The original logo, as
        found on the reverse-side of the first album cover, was a simple line
        drawing but more intricate colour versions were used on later sleeves.
  - source_sentence: What musician joined Kanye West on the song "Only One"?
    sentences:
      - >-
        Modern Nationalism, as developed especially since the French Revolution,
        has made the distinction between "language" and "dialect" an issue of
        great political importance. A group speaking a separate "language" is
        often seen as having a greater claim to being a separate "people", and
        thus to be more deserving of its own independent state, while a group
        speaking a "dialect" tends to be seen not as "a people" in its own
        right, but as a sub-group, part of a bigger people, which must content
        itself with regional autonomy.[citation needed] The distinction between
        language and dialect is thus inevitably made at least as much on a
        political basis as on a linguistic one, and can lead to great political
        controversy, or even armed conflict.
      - >-
        In June 2013, West and television personality Kim Kardashian announced
        the birth of their first child, North. In October 2013, the couple
        announced their engagement to widespread media attention. November 2013,
        West stated that he was beginning work on his next studio album, hoping
        to release it by mid-2014, with production by Rick Rubin and Q-Tip. In
        December 2013, Adidas announced the beginning of an official apparel
        collaboration with West, to be premiered the following year. In May
        2014, West and Kardashian were married in a private ceremony in
        Florence, Italy, with a variety of artists and celebrities in
        attendance. West released a single, "Only One", featuring Paul
        McCartney, on December 31, 2014. "FourFiveSeconds", a single jointly
        produced with Rihanna and McCartney, was released in January 2015. West
        also appeared on the Saturday Night Live 40th Anniversary Special, where
        he premiered a new song entitled "Wolves", featuring Sia Furler and
        fellow Chicago rapper, Vic Mensa. In February 2015, West premiered his
        clothing collaboration with Adidas, entitled Yeezy Season 1, to
        generally positive reviews. This would include West's Yeezy Boost
        sneakers. In March 2015, West released the single "All Day" featuring
        Theophilus London, Allan Kingdom and Paul McCartney. West performed the
        song at the 2015 BRIT Awards with a number of US rappers and UK grime
        MC's including: Skepta, Wiley, Novelist, Fekky, Krept & Konan, Stormzy,
        Allan Kingdom, Theophilus London and Vic Mensa. He would premiere the
        second iteration of his clothing line, Yeezy Season 2, in September 2015
        at New York Fashion Week.
      - >-
        As of 2013, West has won a total of 21 Grammy Awards, making him one of
        the most awarded artists of all-time. About.com ranked Kanye West No. 8
        on their "Top 50 Hip-Hop Producers" list. On May 16, 2008, Kanye West
        was crowned by MTV as the year's No. 1 "Hottest MC in the Game." On
        December 17, 2010, Kanye West was voted as the MTV Man of the Year by
        MTV. Billboard ranked Kanye West No. 3 on their list of Top 10 Producers
        of the Decade. West ties with Bob Dylan for having topped the annual
        Pazz & Jop critic poll the most number of times ever, with four
        number-one albums each. West has also been included twice in the Time
        100 annual lists of the most influential people in the world as well as
        being listed in a number of Forbes annual lists.
  - source_sentence: What was Berners-Lee a director of?
    sentences:
      - >-
        The first web browser was invented in 1990 by Sir Tim Berners-Lee.
        Berners-Lee is the director of the World Wide Web Consortium (W3C),
        which oversees the Web's continued development, and is also the founder
        of the World Wide Web Foundation. His browser was called WorldWideWeb
        and later renamed Nexus.
      - >-
        A particular criticism of the Buddha was Vedic animal sacrifice.[web 18]
        He also mocked the Vedic "hymn of the cosmic man". However, the Buddha
        was not anti-Vedic, and declared that the Veda in its true form was
        declared by "Kashyapa" to certain rishis, who by severe penances had
        acquired the power to see by divine eyes. He names the Vedic rishis, and
        declared that the original Veda of the rishis[note 25] was altered by a
        few Brahmins who introduced animal sacrifices. The Buddha says that it
        was on this alteration of the true Veda that he refused to pay respect
        to the Vedas of his time. However, he did not denounce the union with
        Brahman,[note 26] or the idea of the self uniting with the Self. At the
        same time, the traditional Hindu itself gradually underwent profound
        changes, transforming it into what is recognized as early Hinduism.
      - >-
        The first web browser was invented in 1990 by Sir Tim Berners-Lee.
        Berners-Lee is the director of the World Wide Web Consortium (W3C),
        which oversees the Web's continued development, and is also the founder
        of the World Wide Web Foundation. His browser was called WorldWideWeb
        and later renamed Nexus.
  - source_sentence: Who wrote 182 papers dealing with the hibernation of swallows?
    sentences:
      - >-
        This means that a 5% reduction in operating voltage will more than
        double the life of the bulb, at the expense of reducing its light output
        by about 16%. This may be a very acceptable trade off for a light bulb
        that is in a difficult-to-access location (for example, traffic lights
        or fixtures hung from high ceilings). Long-life bulbs take advantage of
        this trade-off. Since the value of the electric power they consume is
        much more than the value of the lamp, general service lamps emphasize
        efficiency over long operating life. The objective is to minimize the
        cost of light, not the cost of lamps. Early bulbs had a life of up to
        2500 hours, but in 1924 a cartel agreed to limit life to 1000 hours.
        When this was exposed in 1953, General Electric and other leading
        American manufacturers were banned from limiting the life.
      - >-
        Aristotle however suggested that swallows and other birds hibernated.
        This belief persisted as late as 1878, when Elliott Coues listed the
        titles of no less than 182 papers dealing with the hibernation of
        swallows. Even the "highly observant" Gilbert White, in his posthumously
        published 1789 The Natural History of Selborne, quoted a man's story
        about swallows being found in a chalk cliff collapse "while he was a
        schoolboy at Brighthelmstone", though the man denied being an
        eyewitness. However, he also writes that "as to swallows being found in
        a torpid state during the winter in the Isle of Wight or any part of
        this country, I never heard any such account worth attending to", and
        that if early swallows "happen to find frost and snow they immediately
        withdraw for a time—a circumstance this much more in favour of hiding
        than migration", since he doubts they would "return for a week or two to
        warmer latitudes".
      - >-
        Aristotle however suggested that swallows and other birds hibernated.
        This belief persisted as late as 1878, when Elliott Coues listed the
        titles of no less than 182 papers dealing with the hibernation of
        swallows. Even the "highly observant" Gilbert White, in his posthumously
        published 1789 The Natural History of Selborne, quoted a man's story
        about swallows being found in a chalk cliff collapse "while he was a
        schoolboy at Brighthelmstone", though the man denied being an
        eyewitness. However, he also writes that "as to swallows being found in
        a torpid state during the winter in the Isle of Wight or any part of
        this country, I never heard any such account worth attending to", and
        that if early swallows "happen to find frost and snow they immediately
        withdraw for a time—a circumstance this much more in favour of hiding
        than migration", since he doubts they would "return for a week or two to
        warmer latitudes".
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on BAAI/bge-base-en-v1.5
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: gooqa dev
          type: gooqa-dev
        metrics:
          - type: cosine_accuracy
            value: 0.4115999937057495
            name: Cosine Accuracy

SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-2-epochs")
# Run inference
sentences = [
    'Who wrote 182 papers dealing with the hibernation of swallows?',
    'Aristotle however suggested that swallows and other birds hibernated. This belief persisted as late as 1878, when Elliott Coues listed the titles of no less than 182 papers dealing with the hibernation of swallows. Even the "highly observant" Gilbert White, in his posthumously published 1789 The Natural History of Selborne, quoted a man\'s story about swallows being found in a chalk cliff collapse "while he was a schoolboy at Brighthelmstone", though the man denied being an eyewitness. However, he also writes that "as to swallows being found in a torpid state during the winter in the Isle of Wight or any part of this country, I never heard any such account worth attending to", and that if early swallows "happen to find frost and snow they immediately withdraw for a time—a circumstance this much more in favour of hiding than migration", since he doubts they would "return for a week or two to warmer latitudes".',
    'Aristotle however suggested that swallows and other birds hibernated. This belief persisted as late as 1878, when Elliott Coues listed the titles of no less than 182 papers dealing with the hibernation of swallows. Even the "highly observant" Gilbert White, in his posthumously published 1789 The Natural History of Selborne, quoted a man\'s story about swallows being found in a chalk cliff collapse "while he was a schoolboy at Brighthelmstone", though the man denied being an eyewitness. However, he also writes that "as to swallows being found in a torpid state during the winter in the Isle of Wight or any part of this country, I never heard any such account worth attending to", and that if early swallows "happen to find frost and snow they immediately withdraw for a time—a circumstance this much more in favour of hiding than migration", since he doubts they would "return for a week or two to warmer latitudes".',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4116

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,282 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 7 tokens
    • mean: 14.79 tokens
    • max: 42 tokens
    • min: 28 tokens
    • mean: 148.38 tokens
    • max: 494 tokens
    • min: 31 tokens
    • mean: 152.75 tokens
    • max: 512 tokens
  • Samples:
    question context negative
    How long did the Hollywood round air for in season eight of American Idol? In the first major change to the judging panel, a fourth judge, Kara DioGuardi, was introduced. This was also the first season without executive producer Nigel Lythgoe who left to focus on the international versions of his show So You Think You Can Dance. The Hollywood round was moved to the Kodak Theatre for 2009 and was also extended to two weeks. Idol Gives Back was canceled for this season due to the global recession at the time. American Idol is an American singing competition series created by Simon Fuller and produced by 19 Entertainment, and is distributed by FremantleMedia North America. It began airing on Fox on June 11, 2002, as an addition to the Idols format based on the British series Pop Idol and has since become one of the most successful shows in the history of American television. The concept of the series is to find new solo recording artists, with the winner being determined by the viewers in America. Winners chosen by viewers through telephone, Internet, and SMS text voting were Kelly Clarkson, Ruben Studdard, Fantasia Barrino, Carrie Underwood, Taylor Hicks, Jordin Sparks, David Cook, Kris Allen, Lee DeWyze, Scotty McCreery, Phillip Phillips, Candice Glover, Caleb Johnson, and Nick Fradiani.
    What is one example of how students can benefit from videoconferencing? Videoconferencing provides students with the opportunity to learn by participating in two-way communication forums. Furthermore, teachers and lecturers worldwide can be brought to remote or otherwise isolated educational facilities. Students from diverse communities and backgrounds can come together to learn about one another, although language barriers will continue to persist. Such students are able to explore, communicate, analyze and share information and ideas with one another. Through videoconferencing, students can visit other parts of the world to speak with their peers, and visit museums and educational facilities. Such virtual field trips can provide enriched learning opportunities to students, especially those in geographically isolated locations, and to the economically disadvantaged. Small schools can use these technologies to pool resources and provide courses, such as in foreign languages, which could not otherwise be offered. Typical use of the various technologies described above include calling or conferencing on a one-on-one, one-to-many or many-to-many basis for personal, business, educational, deaf Video Relay Service and tele-medical, diagnostic and rehabilitative use or services. New services utilizing videocalling and videoconferencing, such as teachers and psychologists conducting online sessions, personal videocalls to inmates incarcerated in penitentiaries, and videoconferencing to resolve airline engineering issues at maintenance facilities, are being created or evolving on an ongoing basis.
    What names were used by The Sun to characterize the French and Germans? The Sun has been openly antagonistic towards other European nations, particularly the French and Germans. During the 1980s and 1990s, the nationalities were routinely described in copy and headlines as "frogs", "krauts" or "hun". As the paper is opposed to the EU it has referred to foreign leaders who it deemed hostile to the UK in unflattering terms. Former President Jacques Chirac of France, for instance, was branded "le Worm". An unflattering picture of German chancellor Angela Merkel, taken from the rear, bore the headline "I'm Big in the Bumdestag" (17 April 2006). On 10 October, hostilities began between German and French republican forces near Orléans. At first, the Germans were victorious but the French drew reinforcements and defeated the Germans at the Battle of Coulmiers on 9 November. After the surrender of Metz, more than 100,000 well-trained and experienced German troops joined the German 'Southern Army'. The French were forced to abandon Orléans on 4 December, and were finally defeated at the Battle of Le Mans (10–12 January). A second French army which operated north of Paris was turned back at the Battle of Amiens (27 November), the Battle of Bapaume (3 January 1871) and the Battle of St. Quentin (13 January).
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 3 tokens
    • mean: 14.7 tokens
    • max: 39 tokens
    • min: 28 tokens
    • mean: 153.44 tokens
    • max: 510 tokens
    • min: 28 tokens
    • mean: 150.79 tokens
    • max: 510 tokens
  • Samples:
    question context negative_1
    Bilateral treaties are concluded between how many states or entities? Bilateral treaties are concluded between two states or entities. It is possible, however, for a bilateral treaty to have more than two parties; consider for instance the bilateral treaties between Switzerland and the European Union (EU) following the Swiss rejection of the European Economic Area agreement. Each of these treaties has seventeen parties. These however are still bilateral, not multilateral, treaties. The parties are divided into two groups, the Swiss ("on the one part") and the EU and its member states ("on the other part"). The treaty establishes rights and obligations between the Swiss and the EU and the member states severally—it does not establish any rights and obligations amongst the EU and its member states.[citation needed] Bilateral treaties are concluded between two states or entities. It is possible, however, for a bilateral treaty to have more than two parties; consider for instance the bilateral treaties between Switzerland and the European Union (EU) following the Swiss rejection of the European Economic Area agreement. Each of these treaties has seventeen parties. These however are still bilateral, not multilateral, treaties. The parties are divided into two groups, the Swiss ("on the one part") and the EU and its member states ("on the other part"). The treaty establishes rights and obligations between the Swiss and the EU and the member states severally—it does not establish any rights and obligations amongst the EU and its member states.[citation needed]
    What period is referred to as a "golden age"? The term "classical music" did not appear until the early 19th century, in an attempt to distinctly canonize the period from Johann Sebastian Bach to Beethoven as a golden age. The earliest reference to "classical music" recorded by the Oxford English Dictionary is from about 1836. In European history, the Middle Ages or medieval period lasted from the 5th to the 15th century. It began with the collapse of the Western Roman Empire and merged into the Renaissance and the Age of Discovery. The Middle Ages is the middle period of the three traditional divisions of Western history: Antiquity, Medieval period, and Modern period. The Medieval period is itself subdivided into the Early, the High, and the Late Middle Ages.
    What is the name of the area of land around Nanjing? Nanjing, with a total land area of 6,598 square kilometres (2,548 sq mi), is situated in the heartland of drainage area of lower reaches of Yangtze River, and in Yangtze River Delta, one of the largest economic zones of China. The Yangtze River flows past the west side and then north side of Nanjing City, while the Ningzheng Ridge surrounds the north, east and south side of the city. The city is 300 kilometres (190 mi) west-northwest of Shanghai, 1,200 kilometres (750 mi) south-southeast of Beijing, and 1,400 kilometres (870 mi) east-northeast of Chongqing. The downstream Yangtze River flows from Jiujiang, Jiangxi, through Anhui and Jiangsu to East Sea, north to drainage basin of downstream Yangtze is Huai River basin and south to it is Zhe River basin, and they are connected by the Grand Canal east to Nanjing. The area around Nanjing is called Hsiajiang (下江, Downstream River) region, with Jianghuai (江淮) stressing northern part and Jiangzhe (江浙) stressing southern part. The region is a... Nanjing ( listen; Chinese: 南京, "Southern Capital") is the city situated in the heartland of lower Yangtze River region in China, which has long been a major centre of culture, education, research, politics, economy, transport networks and tourism. It is the capital city of Jiangsu province of People's Republic of China and the second largest city in East China, with a total population of 8,216,100, and legally the capital of Republic of China which lost the mainland during the civil war. The city whose name means "Southern Capital" has a prominent place in Chinese history and culture, having served as the capitals of various Chinese dynasties, kingdoms and republican governments dating from the 3rd century AD to 1949. Prior to the advent of pinyin romanization, Nanjing's city name was spelled as Nanking or Nankin. Nanjing has a number of other names, and some historical names are now used as names of districts of the city, and among them there is the name Jiangning (江寧), whose former c...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3524
0.2890 100 0.638 0.7838 0.3856
0.5780 200 0.4271 0.7404 0.3988
0.8671 300 0.3997 0.7197 0.4014
1.1561 400 0.3068 0.7188 0.4142
1.4451 500 0.2351 0.7110 0.4106
1.7341 600 0.2389 0.7110 0.4120
-1 -1 - - 0.4116

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}