Jade-ModernBert-FT / README.md

lwoollett

Add new SentenceTransformer model

82b0c22 verified 7 months ago

preview code

raw

history blame

41.4 kB

metadata

language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:10217
  - loss:CachedMultipleNegativesRankingLoss
base_model: nomic-ai/modernbert-embed-base
widget:
  - source_sentence: >-
      What integer value is assigned to the global constant SDS_SecondaryType in
      JADE?
    sentences:
      - >-
        #### drawWidth


        **Type:** - Integer


        **Availability:** - Read or write at run time only


        The **drawWidth **property of the
        [Window](../window_class/window_class.htm) class contains the line width
        for output from graphics methods on a form or control.


        Set the **drawWidth** property to a value in the range **1** through
        **32,767**.  This value represents the width of the line in pixels.  The
        default value is **1** pixel wide.


        Increase the value of the **drawWidth** property to increase the width
        of the line.
      - >-
        #### JadeDynamicObjectTypes Category Global Constants


        The global constants listed in the following table define symbolic names
        for the values of the
        [JadeDynamicObject](../../encyclosys1/jadedynamicobject_class/jadedynamicobject_class.htm#jadedynamicobjectclass)
        class
        [type](../../encyclosys1/jadedynamicobject_class/type.htm#typejadedynamicobject)
        attribute of dynamic objects returned from
        [JadeDatabaseAdmin](../../encyclosys1/jadedatabaseadmin_class/jadedatabaseadmin_class.htm#jadedatabaseadminclass)
        class query methods.


        | Global Constant | Integer Value |

        | ---- | ---- |

        | SDS_PrimaryType | 1 |

        | SDS_SecondaryProxyType | 2 |

        | SDS_SecondaryType | 3 |

        | SDS_TransactionType | 4 |
      - "#### sortOrder\n\n**Type:** - Integer\n\n**Availability:** - Read or write at run time only\n\nThe **sortOrder **property of the [JadeTableColumn](jadetablecolumn_class.htm) class contains the precedence of the column referenced by this object when sorting, in the range **1** through **3**, or it contains zero (**0**) to remove sorting on the current column.\n\nFor a description of this property, see the [Table](../../encyclowin/control_class/table_class.htm#tableclass) control [sortColumn](../../encyclowin/window__form__and_control_properties/sortcolumn.htm#sortcolumnwin) property.  See also the [JadeTableColumn](jadetablecolumn_class.htm) class [sortAsc](sortasc.htm), [sortCased](sortcased.htm), and [sortType](sorttype.htm) properties, which are dependent on the column already being recorded as a sort column by the **sortOrder** property.\n\nThe code fragment in the following example shows the use of the **sortOrder** property.\n\n```\ntable1.accessColumn(2).sortOrder := 1;   // first column in sort\r\ntable1.accessColumn(4).sortOrder := 2;   // second column\r\ntable1.accessColumn(5).sortOrder := 3;   // third column\n```"
  - source_sentence: How are values in the ByteArray referenced?
    sentences:
      - "#### findAllElementsByNameNS\n\n```\nfindAllElementsByNameNS(namespaceURI: String;\r\n                        localName:    String;\r\n                        elements:     JadeXMLElementArray input);\n```\nThe **findAllElementsByNameNS **method of the [JadeXMLElement](jadexmlelement_class.htm) class fills the elements array with all descendant elements that have the values specified in the **namespaceURI** and **localName** parameters, respectively.\n\nAs the search uses the collection sequence, the elements may not be in the document sequence.\n\nIf you want to match all namespaces or local names, specify an asterisk character (**'*'**) in the **namespaceURI** or **localName** parameter.  Note, however, that if you specify **\"*\"** in the **localName** parameter, the access method uses the document sequence to locate the requested elements rather than the collection sequence that optimizes performance."
      - >-
        ## ByteArray Class


        The **ByteArray** class is an ordered collection of
        [Byte](../../encycloprim/byte_type/byte_type.htm#byte) values in which
        the values are referenced by their position in the collection.


        Byte arrays inherit the methods defined in the
        [Array](../array_class/array_class.htm) class.


        The bracket (**[ ]**) subscript operators enable you to assign values to
        and receive values from a **Byte** array.


        For details about the methods defined in the **ByteArray** class, see
        "[ByteArray Methods](bytearray_methods.htm)", in the following section.


        [Array](../array_class/array_class.htm)


        (None)
      - >-
        #### Exposing Properties for a Selected Class


        To expose all properties for a selected class


        - Right‑click on the class row in the **Classes** table and then select
        the **Expose Properties for Selected Class** command from the popup menu
        that is displayed.


        This command does _not_ automatically add methods or constants to the C#
        exposure, even if the **Show Methods** or **Show Constants** option is
        checked.  (For details, see "[Toggling the Display of
        Methods](toggling_the_display_of_methods.htm)" or "[Toggling the Display
        Constants](toggling_the_display_of_constants.htm)", later in this
        chapter.)


        All properties in that class are then exposed for inclusion in the C#
        exposure; that is, each property check box in the **Features** pane is
        checked, indicating that the properties for that class will be generated
        in the C# class library.


        You can tailor the property selection by unchecking the check box of any
        property that you want to exclude from the exposure.
  - source_sentence: How can you resolve opening database error 14544 in single user mode?
    sentences:
      - "#### Changing Lock Type\n\nA type upgrade can queue and potentially time out, causing a [JoobObjectLockedException](joobobjectlockedexception.htm) to be thrown, if the requested type is not compatible with existing locks. For example, this could happen when upgrading a shared lock to exclusive.\n\nLock type downgrades will never be queued, as the strength is being lowered so there will be no lock incompatibilities.\n\nWhen a Jade session is in transaction state, requests to downgrade lock type are ignored. The lock maintains its current type. However, lock types can be upgraded regardless of transaction state.\n\nWhen a lock type is being upgraded from shared to update, the object is unlocked before the update lock is requested. This happens even if the Jade session is in transaction state, and is the only situation where an object is unlocked while in transaction state. The reason for doing this is to prevent potential deadlocks, as discussed in more detail under \"[Avoiding Deadlock Exceptions](avoiding_deadlock_exceptions.htm)\", later in this chapter.\n\nThe following code fragment gives examples of upgrading and downgrading lock types.\n\n```\nTimeSpan timeOut = TimeSpan.FromSeconds(10);\r\ncontext.Lock(obj1, LockType.Shared, LockDuration.Transaction, timeOut);\r\ncontext.Lock(obj1, LockType.Reserve, LockDuration.Transaction, timeOut);\r\n                                // The lock is now upgraded from shared to reserve.\r\ncontext.Lock(coll, LockType.Exclusive, LockDuration.Transaction, timeOut);\r\n                   \r\nusing (System.Data.IDbTransaction tran = context.BeginTransaction())\r\n{\r\n    context.Lock(obj1, LockType.Exclusive, LockDuration.Transaction,\r\n                       timeOut); // The lock type is upgraded to exclusive, as\r\n                                 // locks can be upgraded (but not downgraded)\r\n                                 // when in transaction state.\r\n    foreach (C1 obj2 in coll)\r\n    {\r\n        // The exclusive lock on coll is not downgraded by the implicit shared\r\n        // lock associated with foreach, because transaction state is in effect.\r\n    }\r\n    context.Lock(obj1, LockType.Shared, LockDuration.Transaction, timeOut);\r\n                      // The lock type is not downgraded, but remains as exclusive.\r\n    tran.Commit();    // All transaction duration locks are released.\r\n}\n```"
      - >-
        ### 1411 - Attempt to add unknown system file


        Cause


        This error occurs if the system schema maintenance function attempts to
        add a new unknown system file.


        Action


        This is an internal error.  If your Jade licenses include support,
        contact your local Jade support center or Jade Support.
      - >-
        ### 14544 - A concurrent process has already opened the same database


        Cause


        This error occurs if you attempt to open a database that is already open
        in single user (exclusive) mode.


        Action


        Determine in which mode the database should be opened; that is, single
        user or multiuser mode.
  - source_sentence: What is the cause of the 3323 DbCrypt error?
    sentences:
      - >-
        ### 3323 - DbCrypt memory allocation failure


        Cause


        This error occurs if a memory allocation error occurs in the use of the
        database encryption module.


        Action


        If your Jade licenses include support, contact your local Jade support
        center or Jade Support.
      - >-
        ### 3028 - Database file is in use by another process


        Cause


        This error occurs if you attempt to open a database file that is already
        open by another process.


        Action


        Refer to the Jade messages log file (**jommsg.log**) for information
        about the file.  Generally, another program is accessing the file or the
        database as a whole.
      - >-
        ### Where Do Jade Methods Execute?


        Jade methods execute only in Jade nodes. A Jade node is the fundamental
        building block of Jade's distributed architecture. Each node contains
        the Jade Object Manager (JOM), the Jade Interpreter, various caches, and
        one or more Jade processes.


        The Jade thin client is _not_ a Jade node; Jade methods do not execute
        there, although a great deal of effort has been expended to make it look
        as though they do.


        In most production systems, there is one database server node
        (**jadrap.exe**, **jadrapb.exe**, or **jadserv.exe**), one or more
        application server nodes (**jadapp.exe** or **jadappb.exe**), and one or
        more fat/standard client nodes (**jade.exe**) for background processing,
        web services, or HTML forms.


        When **jade.exe** is run in single user mode, there is one node only.
  - source_sentence: Which subclasses are associated with the JadeXMLCharacterData class?
    sentences:
      - >-
        ## JadeXMLCharacterData Class


        The **JadeXMLCharacterData** class is the abstract superclass of
        character-based nodes in an XML document tree; that is, the text,
        **CDATA**, and comment nodes.


        For details about the property defined in the **JadeXMLCharacterData**
        class, see "[JadeXMLCharacterData
        Property](jadexmlcharacterdata_property.htm)", in the following section.


        [JadeXMLNode](../jadexmlnode_class/jadexmlnode_class.htm)


        [JadeXMLCDATA](../jadexmlcdata_class/jadexmlcdata_class.htm),
        [JadeXMLComment](../jadexmlcomment_class/jadexmlcomment_class.htm),
        [JadeXMLText](../jadexmltext_class/jadexmltext_class.htm)
      - "### Minimizing the Working Set\n\nIn loops where there are multiple filters, apply the cheapest filters first and then the filters that reduce the working set the most. For example, consider the following code fragment, which finds sales of appliances in a specified city.\n\n```\nwhile iter.next(tran) do\r\n    if  tran.type = Type_Sale\r\n    and tran.myBranch.myLocation.city = targetCity\r\n    and tran.myProduct.isAppliance then\r\n        <do something with tran>\r\n    endif;\r\nendwhile;\n```\nIn this example, **tran.type** should be checked first, because it is the cheapest. The **tran** object must be fetched to evaluate all of the other conditions, so we may as well check the **type** attribute first. If we did the **isAppliance** check first, we would have to fetch all of the product objects for the transactions that were not sales. Regardless of how many transactions are sales and how many products are appliances, it will save time to check **tran.type** first.\n\nNow, assume that:\n\n- 80 percent of transactions are sales\n\n- 15 percent, on average, are likely to be in the target city\n\n- 90 percent of the products are appliances\n\nIt pays to check the city first, even though it means fetching the branch and location objects for the non‑appliance products. There are very few non‑appliance products, so the number of extra fetches is small. By contrast, checking for non‑appliance products for all other cities would result in a large number of extra fetches.\n\nIt doesn't matter if the filters are conditions of an [if](../../devref/ch1languageref/if_instruction.htm#if) instruction, multiple [if](../../devref/ch1languageref/if_instruction.htm#if) instructions, or multiple conditions in the [where](../../devref/ch1languageref/where_clause_optimization.htm#whereoptimization) clause of a [while](../../devref/ch1languageref/while_instruction.htm#while) statement; the end result is the same.\n\nThis code fragment example is simple and concise, to convey the concept. In the real world, each successive filter may be in another method, another class, or even another schema. It may take a bit of investigation to find all of the filters involved in a single loop."
      - >-
        ##### responseType


        Use the **responseType** parameter of the
        [beginNotification](beginnotification.htm) method to specify the
        frequency with which the subscribed event was notified.


        The valid values for the **responseType** parameter, represented by
        global constants in the
        [NotificationResponses](../../encycloprim/appaglobalconstants/notificationresponses_category.htm#notificationresponsescategory)
        category, are listed in the following table.


        | Global Constant | Integer Value | Sends a notification… |

        | ---- | ---- | ---- |

        | Response_Cancel | 1 | When the object receives a matching event and
        then cancels the notification |

        | Response_Continuous | 0 | Whenever the object receives a matching
        event |

        | Response_Suspend | 2 | When the object receives a matching event and
        then suspends notification until the user refreshes the local copy of
        the object |
pipeline_tag: sentence-similarity
library_name: sentence-transformers

Beep boop

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the jade_embeddings_train_25.04.04 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: nomic-ai/modernbert-embed-base
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- jade_embeddings_train_25.04.04
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("lwoollett/jade-ft-14-bert-static")
# Run inference
sentences = [
    'Which subclasses are associated with the JadeXMLCharacterData class?',
    '## JadeXMLCharacterData Class\n\nThe **JadeXMLCharacterData** class is the abstract superclass of character-based nodes in an XML document tree; that is, the text, **CDATA**, and comment nodes.\n\nFor details about the property defined in the **JadeXMLCharacterData** class, see "[JadeXMLCharacterData Property](jadexmlcharacterdata_property.htm)", in the following section.\n\n[JadeXMLNode](../jadexmlnode_class/jadexmlnode_class.htm)\n\n[JadeXMLCDATA](../jadexmlcdata_class/jadexmlcdata_class.htm), [JadeXMLComment](../jadexmlcomment_class/jadexmlcomment_class.htm), [JadeXMLText](../jadexmltext_class/jadexmltext_class.htm)',
    "### Minimizing the Working Set\n\nIn loops where there are multiple filters, apply the cheapest filters first and then the filters that reduce the working set the most. For example, consider the following code fragment, which finds sales of appliances in a specified city.\n\n```\nwhile iter.next(tran) do\r\n    if  tran.type = Type_Sale\r\n    and tran.myBranch.myLocation.city = targetCity\r\n    and tran.myProduct.isAppliance then\r\n        <do something with tran>\r\n    endif;\r\nendwhile;\n```\nIn this example, **tran.type** should be checked first, because it is the cheapest. The **tran** object must be fetched to evaluate all of the other conditions, so we may as well check the **type** attribute first. If we did the **isAppliance** check first, we would have to fetch all of the product objects for the transactions that were not sales. Regardless of how many transactions are sales and how many products are appliances, it will save time to check **tran.type** first.\n\nNow, assume that:\n\n- 80 percent of transactions are sales\n\n- 15 percent, on average, are likely to be in the target city\n\n- 90 percent of the products are appliances\n\nIt pays to check the city first, even though it means fetching the branch and location objects for the non‑appliance products. There are very few non‑appliance products, so the number of extra fetches is small. By contrast, checking for non‑appliance products for all other cities would result in a large number of extra fetches.\n\nIt doesn't matter if the filters are conditions of an [if](../../devref/ch1languageref/if_instruction.htm#if) instruction, multiple [if](../../devref/ch1languageref/if_instruction.htm#if) instructions, or multiple conditions in the [where](../../devref/ch1languageref/where_clause_optimization.htm#whereoptimization) clause of a [while](../../devref/ch1languageref/while_instruction.htm#while) statement; the end result is the same.\n\nThis code fragment example is simple and concise, to convey the concept. In the real world, each successive filter may be in another method, another class, or even another schema. It may take a bit of investigation to find all of the filters involved in a single loop.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

jade_embeddings_train_25.04.04

Dataset: jade_embeddings_train_25.04.04
Size: 10,217 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 8 tokens
mean: 17.17 tokens
max: 30 tokens

min: 27 tokens
mean: 363.15 tokens
max: 6303 tokens

	anchor	positive
type	string	string
details	min: 8 tokens mean: 17.17 tokens max: 30 tokens	min: 27 tokens mean: 363.15 tokens max: 6303 tokens

Samples:

anchor	positive
`What is the format for defining a Byte constant in JADE?`	##### Constant Definition Tips When defining a constant value, the value of a constant can be a simple literal value or an expression constructed using literals and other constants. For details about literal types, see "Literals", in Chapter - 1 of the Developer's Reference. You can define the value for a constant whose primitive type is not a specific literal format by using a typecast of a String literal or in the case of a Byte, a small Integer literal, as shown in the examples in the following table.
`How does the replaceFrom__ method handle case sensitivity?`	#### replaceFrom__ ``` replaceFrom__(target: String;
replacement: String;
startIndex: Integer;
bIgnoreCase: Boolean): String; ``` The replaceFrom__ method of the String primitive type replaces only the first occurrence of the substring specified in the target parameter with the substring specified in the replacement parameter, starting from the specified startIndex parameter. Case‑sensitivity is ignored if you set the value of the bIgnoreCase parameter to true. Set this parameter to false if you want the substring replacement to be case‑sensitive. This method raises exception 1413 (Index used in string operation is out of bounds) if the value specified in the startIndex parameter is less than 1 or it is greater than the length of the original string. In addition, it returns the original receiver String if the value specified in the target parameter has a length of zero (**...
`What does the global constant Ex_Continue do?`	`## Exceptions Category The global constants for exceptions are listed in the following table.`

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 32
}

Evaluation Dataset

jade_embeddings_train_25.04.04

Dataset: jade_embeddings_train_25.04.04
Size: 1,136 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 8 tokens
mean: 17.07 tokens
max: 41 tokens

min: 25 tokens
mean: 365.93 tokens
max: 3397 tokens

	anchor	positive
type	string	string
details	min: 8 tokens mean: 17.07 tokens max: 41 tokens	min: 25 tokens mean: 365.93 tokens max: 3397 tokens

Samples:

anchor	positive
`What is the keyword list constant value for JADE_SYSTEMVARS?`	### changeKeywords ``` changeKeywords(action: Integer;
keywordList: Integer;
keywords: String); ``` The changeKeywords method of the JadeTextEdit class modifies one or more of the current keyword lists. The keyword lists are used by the current language lexical analyzer to classify the tokens found in the text. For the Jade language, this includes keywords, class names, constant names, and so on. The value of the action parameter can be one of the JadeTextEdit class constants listed in the following table.	Class Constant
`What should you click to abandon the deletion of a report in JADE?`	#### Delete Report Command Use the Delete Report command from the File menu to delete a report. To delete a report 1. Select the Delete Report command from the File menu. The Delete Report dialog, shown in the following image, is then displayed. 2. Select the report that you want to delete from the Report list box or enter the name in the Report name text box. 3. Filter the list of report names in the Reports list box in one or both of the following ways. - To display only those reports that contain that text in their report description, enter text in the Text contains text box. For example, only those reports that mention Pay in their description are displayed if you enter Pay, providing a refined selection list. - To display only those reports modified during a specified period, select a last modified period from the Last modified list box. For example, only those reports that were modified in...
`What types of objects can be set for the userGroupObject in JadeMultiWorkerTcpTransport?`	#### userGroupObject Type: - Object The userGroupObject property of the JadeMultiWorkerTcpTransport class contains a reference to an object that you can associate with the transport group between event callbacks. You must set the value of this property to a shared transient or a persistent object, as it must be visible to other workers. The default value is null. To prevent an object leak, it is your responsibility to delete this object, if required, in your implementation of the closedEvent method in the receiver class.

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 32
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 18
per_device_eval_batch_size: 18
num_train_epochs: 4
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 18
per_device_eval_batch_size: 18
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss
0.1761	100	0.0851	0.0243
0.3521	200	0.0262	0.0211
0.5282	300	0.0275	0.0217
0.7042	400	0.0216	0.0256
0.8803	500	0.0283	0.0241
1.0563	600	0.0226	0.0195
1.2324	700	0.0113	0.0170
1.4085	800	0.0114	0.0204
1.5845	900	0.0165	0.0182
1.7606	1000	0.0129	0.0219
1.9366	1100	0.0126	0.0181
2.1127	1200	0.0069	0.0207
2.2887	1300	0.0045	0.0212
2.4648	1400	0.0046	0.0187
2.6408	1500	0.0056	0.0206
2.8169	1600	0.0084	0.0196
2.9930	1700	0.005	0.0214
3.1690	1800	0.0056	0.0202
3.3451	1900	0.0088	0.0190
3.5211	2000	0.0026	0.0202
3.6972	2100	0.0064	0.0205
3.8732	2200	0.006	0.0202

Framework Versions

Python: 3.11.11
Sentence Transformers: 4.0.2
Transformers: 4.51.0
PyTorch: 2.8.0.dev20250319+cu128
Accelerate: 1.6.0
Datasets: 3.5.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}