Add new SentenceTransformer model

Browse files

Files changed (10) hide show

1_Pooling/config.json +10 -0
README.md +1022 -0
config.json +47 -0
config_sentence_transformers.json +10 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +945 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,1022 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:3619
+- loss:CachedMultipleNegativesRankingLoss
+base_model: nomic-ai/modernbert-embed-base
+widget:
+- source_sentence: What is the meaning of the pattern code 128 in the table?
+  sentences:
+  - "epevents\nThe following table may serve as a quick reference to select certain\n\
+    pattern types of recognized events (i.e. away from CCD edges, bad pixels\netc.):\n\
+    \n   ‘PATTERN‘  Meaning\n  ----------- ---------------------------------------------------\n\
+    \       0      singles\n       1      doubles in Y with Y(main)<Y(secondary)\n\
+    \       2      doubles in X with X(main)<X(secondary)\n       3      doubles in\
+    \ Y with Y(main)>Y(secondary)\n       4      doubles in X with X(main)>X(secondary)\n\
+    \      5–8     triples\n     9–12     quadruples\n      128     singles at CCD\
+    \ window (RAWX=1, RAWX=64, RAWY=200)\n              or close to bad pixels\n \
+    \     205     doubles at CCD window or bad pixels\n      206     triples at CCD\
+    \ window or bad pixels\n      207     quadruples at CCD window or bad pixels\n\
+    \nNote: as of version 6.30.4 PATTERN values of 128 have been changed to 0\n(i.e. $8^{\\\
+    rm th}$ bit is not set anymore for singles), and PATTERN\nvalues of 205 have been\
+    \ changed to 1–4 (i.e. $7^{\\rm th}$ and\n$8^{\\rm th}$ bit are not set anymore\
+    \ for doubles)!\n\nSecondary events of those valid doubles, triples, and quadruples\n\
+    ($`PATTERN`=1..12$) have ${\\rm PATTERN(main)} + 64$ (as listed above),\ni.e. $7^{\\\
+    rm th}$ bit set.\n\n1.  For the pattern codes in ‘PAT_ID‘ and ‘PATTERN‘ the following\n\
+    \    bit-wise storing is used:\n\n        ‘PAT_ID‘         \n      ----------\
+    \ ------- -------------------------------------------\n             bit   value\
+    \ Meaning\n              16   32768 free for additional pattern related flag\n\
+    \              15   16384 – \" –\n              14    8192 – \" –\n          \
+    \    13    4096 PAT_ORI first digit (x-coordinate)\n              12    2048 –\
+    \ \" –\n              11    1024 PAT_ORI second digit (y-coordinate)\n       \
+    \       10     512 – \" –\n               9     256 PAT_IND: 1, ...,  < 512 (telemetry\
+    \ limit)\n             ...     ... – \" –\n               1       1 – \" –\n\n\
+    \        ‘PATTERN‘                                   \n      ----------- -------\
+    \ ------------------------- ----------------------------------\n             \
+    \ bit   value Meaning                   \n                8     128 sign of PAT_TYP\
+    \           \n                7      64 sign of PAT_IND           \n         \
+    \       6      32                           used to flag PAT_TYP  > 4\n      \
+    \          5      16                           if bit 6, then use next 5 bits\n\
+    \                4       8 MOS code numbers 0 - 12   to store PAT_TYP - 5\n  \
+    \              3       4 a combination of          hence, max storage: PAT_TYP\
+    \ = 36\n                2       2 PAT_TYP  ≤ 4 and          – \" –\n         \
+    \       1       1 PAT_ORI                   – \" –\n\n    Note: as of version\
+    \ 6.30.4 PATTERN values of 128 have been changed\n    to 0 (i.e. $8^{\\rm th}$\
+    \ bit is not set anymore for singles), and\n    PATTERN values of 205 have been\
+    \ changed to 1–4 (i.e. $7^{\\rm th}$\n    and $8^{\\rm th}$ bit are not set anymore\
+    \ for doubles)!\n\n2.  Creation of event quality flags in column ‘FLAG‘. Task:\
+    \ epevents\n    makes use of the common MOS/pn event related flag code (see )\
+    \ and\n    uses the following bits (other flags are set by the Task: epframes\n\
+    \    task):\n\n        ‘FLAG‘           \n      -------- --------- -------------------------------------\n\
+    \           bit     value Meaning (information)\n             1       0x2 ‘INVALID_PATTERN‘\n\
+    \             2       0x4 ‘CLOSE_TO_CCD_WINDOW‘\n             5      0x20 ‘CLOSE_TO_ONBOARD_BADPIX‘\n\
+    \             6      0x40 ‘CLOSE_TO_BRIGHTPIX‘ (not on-board)\n             8\
+    \     0x100 ‘CLOSE_TO_DEADPIX‘ (not on-board)\n            16   0x10000 ‘OUT_OF_FOV‘\
+    \  \n\n          bit      value Meaning (rejection)\n      ------- ----------\
+    \ ---------------------\n           19    0x80000 ‘COSMIC_RAY‘\n           21\
+    \   0x200000 ‘ON_BADPIX‘\n           22   0x400000 ‘SECONDARY‘\n           23\
+    \   0x800000 ‘TRAILING‘\n        total   0xfa0000 EPN rejection mask\n"
+  - "rgssources\nThe source data can come from several sources:\n\n-   A source list\
+    \ from a previous run of Task: rgssources (note that\n    from version 5.1, Task:\
+    \ rgssources is now compatible with all\n    earlier source list formats).\n\n\
+    -   The proposed target source.\n\n-   The attitude of the spacecraft.\n\n-  \
+    \ A source list output by either Task: emldetect or Task: eboxdetect.\n\n-   A\
+    \ source position supplied on the command line by the user.\n\nThese are described\
+    \ individually below.\n"
+  - "rgssources\n## Parameters\n\n  \\label{rgssources:description:parameters}\n \
+    \ \n  **filemode}\t{modify** (Optional): no\n(Type: \n    Controls whether the\
+    \ task opens a previous source list for editing or creates a new one.\n    }\n\
+    \  \\optparm{changeprime}\t{no}\t{boolean}\t{yes|no, Default: string}\t{modify|create,\
+    \ Range: \n    Only active in `filemode`=`modify'. Unless this parameter is set,\
+    \ the previous prime source index number is retained.\n    }\n  \\optparm{changeattitude)\t\
+    {boolean}\t{yes|no}{\n    Only active in `filemode`=`modify'. Unless this parameter\
+    \ is set, the previous attitude (stored in the header) is retained.\n    }\n \
+    \ **srclist}\t{rgsset.ds** (Mandatory): yes\n(Type: \n    The name of the rgs\
+    \ source list. If `filemode`=`create', the output is written to this file. If\
+    \ there is an existing file of this name, it will be overwritten unless SAS\\\
+    _CLOBBER is unset. If `filemode`=`modify', the task looks for an existing source\
+    \ list of this name and modifies it.\n  }\n  **instexpid}\t{}\t{string}\t{, Default:\
+    \ dataset}\t{, Range: \n    This parameter contains information about both the\
+    \ instrument (that is, RGS1 or 2) and the exposure identifier (a letter S or U,\
+    \ indicating scheduled or unscheduled, followed by a three-digit numeric identifier.\
+    \ The `instexpid` string can be supplied in a number of different forms, but the\
+    \ two most useful are (i) as a six-character string comprising either R1 or R2\
+    \ followed by the exposure identifier (an example: `R2S003'); (ii) the name of\
+    \ any of RGS-specific files in the ODF can also be used. This parameter is mandatory\
+    \ if `filemode`=`create', or in cases where the instrument and/or exposure can\
+    \ neither be read from the file header or deduced from its name.\n    }\n  \\\
+    optparm{writeobskwds)\t{boolean}\t{yes|no** (Optional): no\n(Type: yes}\t{boolean}\t\
+    {yes|no, Default: \n    If this is set, the task attempts to write observation-specific\
+    \ keywords to the file header. The user must point the environment variable SAS\\\
+    _ODF to the ODF directory for this to succeed.\n    }\n  \\optparm{writeexpkwds,\
+    \ Range: \n    If this is set, the task attempts to write exposure-specific keywords\
+    \ to the file header. For this to succeed, the user must point the environment\
+    \ variable SAS\\_ODF to the ODF directory, and the task must also be able to determine\
+    \ the exposure number, either via the `instexpid` parameter, or from the `EXPIDSTR`\
+    \ keyword in the file header, or (if neither are present) from the file name.\n\
+    \    }\n  \\optparm{clobberonlabel)\t{boolean}\t{yes|no}{\n    Labels in RGS source\
+    \ lists are required to be unique. Where a clash is detected between a source\
+    \ already in the list and a new candidate source, the task takes one of two actions,\
+    \ depending on the value of this parameter: if `yes', the candidate is discarded;\
+    \ if `no', the task halts with an error.\n    }\n\n  **primestyle}\t{label}\t\
+    {string** (Optional): \n    If `primestyle\n(Type: \n    Only active if \\param{changeprime`=yes\
+    \ and either `addusersource` or `userasprime`=no. It controls the way in which\
+    \ the prime source is specified. See the parameters `primelabel` and `primeindex`.\
+    \ (An additional possible value of `expression' is planned.)\n    }\n  \\optparm{primelabel}\t\
+    {PROPOSAL, Default: label|index|expr|brightest|auto, Range: string}\t{) is active\
+    \ and set to `label', this parameter gives the value of the `LABEL` column of\
+    \ the source that it is desired the `PRIMESRC` keyword should point to.\n    }\n\
+    \  **primeindex}\t{1}\t{integer}\t{$0<$primeindex** (Optional): expmedian\n(Type:\
+    \ }\t{string}\t{, Default: \n    If `primestyle` is active and set to `index',\
+    \ the `PRIMESRC` keyword is set to this value.\n    }\n  \\optparm{primeexpression,\
+    \ Range: \n    This mode is not yet supported.\n    }\n\n  \\optparm{attstyle)\t\
+    {string}{mean|median|start|user|expmedian}{\n    Controls the way the attitude\
+    \ is calculated. If `mean', the attitude is calculated from the mean of the values\
+    \ in the attitude history file. If `median', the median of these values is used.\
+    \ If the value is `start', the task uses the attitude at the start of the exposure\
+    \ as the reference attitude. A value of `expmedian' tells the task to use the\
+    \ median of the attitude during the exposure only, as calculated by Task: attfilter.\
+    \ The final value, `user', allows the user to input the numbers him/herself via\
+    \ the next three parameters.\n    }\n  **meanset}\t{atthk.dat** (Optional): \n\
+    \    The name of the attitude history file. This file is a necessary input in\
+    \ the case that `attstyle\n(Type: \n    The name of the attitude history file.\
+    \ This file is a necessary input in the case that \\param{attstyle` is `mean'.\n\
+    \    }\n  \\optparm{medianset}\t{atthk.dat, Default: dataset}\t{, Range: dataset}\t\
+    {) is `median'.\n    }\n  **attra}\t{0}\t{angle}\t\t{$0\\le$`attra`$\\le 360$**\
+    \ (Mandatory): attgti.ds:STDGTI\n(Type: \n    Only active if `attstyle`=`user'.\
+    \ The declination of the attitude, in decimal degrees.\n    }\n  \\mandparm{attapos}\t\
+    {0}\t{angle}\t{$0\\le$`attapos`$\\le 360$, Default: \n    Only active if `attstyle`=`user'.\
+    \ The right ascension of the attitude, in decimal degrees.\n    }\n  \\mandparm{attdec}\t\
+    {0}\t{angle}\t{$-90\\le$`attdec`$\\le 90$, Range: \n    Only active if `attstyle`=`user'.\
+    \ The position angle of the attitude, in decimal degrees.\n    }\n  **expmediantable){table**\
+    \ (Optional): \n    This should be set if the user wishes to add a source to the\
+    \ list with a position specified on the command line.\n    \n(Type: \n    The\
+    \ name of the table in the filtered attitude history file in which the exposure-median\
+    \ keywords can be found. This file is a necessary input in the case that `attstyle`\
+    \ is `expmedian'.\n    }\n\n  \\optparm{addusersource, Default: , Range: no}\t\
+    {boolean}\t{yes|no)\n  **label}\t{USER}\t{string}\t{** (Optional): \n    Only\
+    \ active if `addusersource\n(Type: \n    Only active if \\param{addusersource`=yes.\
+    \ The brightness of the source in counts per second. It is anticipated that this\
+    \ parameter won't be used much, since this is not a quantity that is likely to\
+    \ be known in most circumstances. The default value of 0.0 is harmless.\n    }\n\
+    \  \\optparm{userasprime}\t{no}\t{boolean}\t{yes|no, Default: \n    Only active\
+    \ if `addusersource`=yes. This is written directly to the `LABEL` column of the\
+    \ output source list. The empty string is not permitted.\n    }\n  \\optparm{rate}\t\
+    {0.0}\t{real}\t\t{$0.0<$rate, Range: \n    Only active if `addusersource`=yes.\
+    \ If `changeprime`=yes and `userasprime`=yes, then the attribute `PRIMESRC` is\
+    \ set to the index number of the user source.\n    }\n  \\optparm{process}\t{no}\t\
+    {boolean}\t{yes|no)=yes. This causes the value in the `PROCESS` column to be set\
+    \ to true for the user-added source.\n    }\n  **bkgexclude}\t{yes}\t{boolean}\t\
+    {yes|no** (Optional): \n    Only active if `addusersource\n(Type: radec, Default:\
+    \ \n    Only active if \\param{addusersource`=yes. This causes the value in the\
+    \ `BKG\\_EXCLUDE` column to be set to true for the user-added source.\n    }\n\
+    \  \\optparm{positionstyle, Range: string}\t{radec|wrtatt)=yes. If `positionstyle`=`radec',\
+    \ then the position of the user-added source is expected via the parameters `ra`\
+    \ and `dec`. If on the other hand `positionstyle`=`wrtatt' (With Respect To ATTitude),\
+    \ then the position of the user-added source is expected via the parameters `deltadisp`\
+    \ and `deltaxdsp`.\n    }\n  **ra}\t\t{0}\t{angle}\t{$0\\le$`ra`$\\le 360$** (Mandatory):\
+    \ \n    Only active if `addusersource\n(Type: \n    Only active if \\param{addusersource`=yes\
+    \ and `positionstyle`=`radec'. The declination of the user-added source, in decimal\
+    \ degrees.\n    }\n  \\mandparm{deltaxdsp}\t{0.0}\t{real}\t\t{, Default: \n  \
+    \  Only active if `addusersource`=yes and `positionstyle`=`radec'. The right ascension\
+    \ of the user-added source, in decimal degrees.\n    }\n  \\mandparm{dec}\t{0}\t\
+    {angle}\t{$-90\\le$`dec`$\\le 90$, Range: \n    Only active if `addusersource`=yes\
+    \ and `positionstyle`=`wrtatt'. The displacement in arcminutes of the user-added\
+    \ source from the pointing direction, in the dispersion direction.\n    }\n  \\\
+    mandparm{deltadisp}\t{0.0}\t{real}\t\t{)=yes and `positionstyle`=`wrtatt'. The\
+    \ displacement in arcminutes of the user-added source from the pointing direction,\
+    \ in the cross-dispersion direction.\n    }\n\n  **withepicset}\t{no}\t{boolean}\t\
+    {yes|no** (Optional): string\n(Type: \n    The name of a set containing a list\
+    \ of sources. Formats output by the tasks Task: emldetect and Task: eboxdetect\
+    \ are accepted.\n    }\n  \\optparm{epiclabelprefix, Default: \n    If this is\
+    \ set, the task looks for the parameter `epicset`, giving the name of an EPIC\
+    \ source list.\n    }\n  \\optparm{epicset}\t{}\t{dataset}\t{, Range: EPIC)\t\
+    {}{\n    This parameter gives the string which is used by the task as a prefix\
+    \ when constructing `LABEL` values for EPIC-derived sources. The other part of\
+    \ the `LABEL` is the number `ML\\_ID\\_SRC` or `BOX\\_ID\\_SRC`. The main purpose\
+    \ of this parameter is to allow several EPIC-derived source lists to be included\
+    \ in the one RGS list if desired, while retaining unique labels.\n    }\n  **doconfusion}\t\
+    {no}\t{boolean}\t{yes|no** (Optional): \n    Active only if `withepicset\n(Type:\
+    \ 3.5,1.0,1.0, Default: \n    Active only if \\param{withepicset`=true. This parameter\
+    \ causes the task to check the epic sources + proposal position for confusion\
+    \ in the EPIC field of view. It is mainly designed for use in the PCMS, to prevent\
+    \ automatic extraction of too many spectra for what is essentially the same object.\
+    \  The degree of confusion depends on the size of the PSF, which is a function\
+    \ of energy.  Therefore, strictly speaking, it depends on the selection of the\
+    \ energy band of interest (`bandids`).  At the moment, however, the a-priori energy\
+    \ of $(0.5+2)/2 = 1.25$~keV is unconditionally used for it, whatever `bandids`\
+    \ is.\n    }\n  \\optparm{instweights, Range: real list}\t{)=true.  This parameter\
+    \ gives the list of weighting factors for EPIC instruments for the use of calculation\
+    \ of RATE, where  the order is the normal ID\\_INST number (i.e., pn, MOS1 and\
+    \ 2).  The resultant RATE in the output RGS source list is normalised to 1.0 in\
+    \ the list, namely in default, it is normalised to the RATE of MOS1 (or 2).\n\
+    \    }\n  **flagepicsrcoutoffov** (Optional): \n    If this is set, the task carries\
+    \ out filtering, where only those sources, the position of which corresponds to\
+    \ cross-dispersion angles on the RGS camera between $-$2.9 and +2.9 arcminutes\
+    \ from camera centre, are regarded as a good source.  If `withepicset\n(Type:\
+    \ \n    Active only if \\param{withepicset`.  If this is set, the input EPIC sources\
+    \ falling outside the FOV (see the description of `enablefilter` for definition)\
+    \ are flagged and are not dropped from the output source list due to that reason.\
+    \  If not (default), either they are dropped from the source list (if `enablefilter`=true)\
+    \ or nothing is done.  See the description of `enablefilter` for the summary of\
+    \ the behaviour.\n    }\n  \\optparm{enablefilter, Default: no}\t{boolean}\t{yes|no,\
+    \ Range: no}\t{boolean}\t{yes|no)=true, the filtering is made also for the input\
+    \ EPIC sources, and the those EPIC sources regarded as no-good are either dropped\
+    \ out of the output list (`flagepicsrcoutoffov`=false) or just flagged as OUTOFFOV\
+    \ (if `flagepicsrcoutoffov`=false) (see section~\\ref{rgssources:description:outputfiles}\
+    \ for the OUTOFFOV flag).  Regardless of whether epic sources are added or not\
+    \ (`withepicset`), the task checks the positions of all sources if `enablefilter`\
+    \ is set and flags them as it is and warns about any that fall outside the FOV.\n\
+    \ \\begin{center}\n \\begin{tabular}{|l|cc|}\n \\multicolumn{3}{c}{When `enablefilter`=true}\\\
+    \\\n \\hline\n    & EPIC sources & Anything else\\\\\n \\hline\n  `flagepicsrcoutoffov`\
+    \ = true  & Flagged & Flagged\\\\\n  `flagepicsrcoutoffov` = false & Dropped &\
+    \ Flagged\\\\\n \\hline\n \\end{tabular}\n \\end{center}\n    }\n  **bandids**\
+    \ (Optional): yes\n(Type: integer list}\t{, Default: 2,3, Range: \n    This parameter\
+    \ gives the list of energy bands accepted for the input EPIC source list.  The\
+    \ RATE value of each source in the output RGS source list is the sum of the RATEs\
+    \ of the corresponding source for the energy bands specified with this parameter.\
+    \  For 1XMM-source-catalogue type ones, this list should be 2, whereas for 2XMM-source-catalogue\
+    \ type ones, this list should be 2, 3 (default).  Although an arbitrary number\
+    \ of elements in the list is allowed, if it is more than 9, only the first 9 energy\
+    \ bands are stated in the `E\\_mBNDnn` header keyword and the rest is unstated\
+    \ (see section~\\ref{rgssources:description:outputfiles}) in the output list.\n\
+    \    }\n  \\optparm{withboresightfudge)\t{boolean}\t{yes|no}{\n    Flip the sign\
+    \ of the boresight euler\\%psi.  {\\bf This parameter will be removed} after the\
+    \ boresight is fixed. \n    }\n\n[INPUT FILES]\nrgssources\n1.  EPIC sources set\
+    \ with a binary extension table named ‘SRCLIST‘\n    (required only if ‘withepicset‘\
+    \ = ‘yes’).\n\n    The following columns need to be present in this table:\n\n\
+    \    -   ‘RA‘: this value is copied into the RGS column of the same name.\n\n\
+    \    -   ‘DEC‘: this value is copied into the RGS column of the same\n       \
+    \ name.\n\n    -   ‘ML_ID_SRC‘ (if the source list was made by Task: emldetect)\
+    \ or\n        ‘BOX_ID_SRC‘ (if the source list was made by Task: eboxdetect):\n\
+    \        this number is included in the ‘LABEL‘ value of the source in\n     \
+    \   the RGS list.\n\n    -   ‘ID_BAND‘: this value is used in distinguishing the\
+    \ energy band\n        in calculating RATE (see below).\n\n    -   ‘RATE‘: the\
+    \ sum of these values in the specified energy bands\n        are written in the\
+    \ output RGS list. The energy band (ID) is\n        listed in the above-mentioned\
+    \ ‘ID_BAND‘ column, whereas the\n        energy band IDs are specified in ‘bandids‘\
+    \ command-line\n        parameter.\n\n2.  RGS sources set as described in the\
+    \ ‘Output files’ section (required\n    only if ‘filemode‘ = ‘modify’).\n\n3.\
+    \  The attitude history file created by Task: atthkgen (required only\n    if\
+    \ ((‘filemode‘ = ‘modify’ and ‘changeattitude‘ = ‘yes’) or\n    ‘filemode‘ = ‘create’)\
+    \ and ‘attstyle‘ = ‘mean’ or ‘median’.).\n\n4.  The filtered attitude history\
+    \ file created by Task: attfilter\n    (required only if ((‘filemode‘ = ‘modify’\
+    \ and ‘changeattitude‘ =\n    ‘yes’) or ‘filemode‘ = ‘create’) and ‘attstyle‘\
+    \ = ‘expmedian’.).\n\n[OUTPUT FILES]\nrgssources\n1.  RGS sources set with a binary\
+    \ extension table named ‘SRCLIST‘. The\n    header has all the keywords mandatory\
+    \ for PPS products, in\n    particular\n\n    -   ‘RA_PNT‘: The right ascension\
+    \ of the attitude in decimal\n        degrees.\n\n    -   ‘DEC_PNT‘: The declination\
+    \ of the attitude in decimal degrees.\n\n    -   ‘PA_PNT‘: The position angle\
+    \ of the attitude in decimal degrees.\n\n    The ‘SRCLIST‘ table has the following\
+    \ keywords:\n\n    -   ‘PRIMESRC‘: The ‘INDEX‘ value (see column description below)\
+    \ of\n        the prime source.\n\n    -   ‘E_EXPRn‘: There are n ( ≤ 99) occurrences\
+    \ of this keyword, one\n        for each EPIC source list added to the RGS list.\
+    \ The numbers ‘n‘\n        are consecutive, starting at 1. The values of these\
+    \ keywords are\n        taken from the ‘INSTRUME‘ header keyword in the input\
+    \ EPIC\n        source list (that is, probably EPN, in most of the cases, which\n\
+    \        does not carry a lot of practical meaning, in fact), although it\n  \
+    \      used to be the exposure IDs of the respective EPIC source files\n     \
+    \   (in the old-style source lists).\n\n    -   ‘E_CONTn‘: Similar to the ‘E_EXPRn‘\
+    \ keyword, but this records\n        the value of the ‘CONTENT‘ keyword in the\
+    \ EPIC file header.\n\n    -   ‘E_mBNDn‘: Similar to the ‘E_EXPRn‘ keyword, but\
+    \ this records\n        the value of either ‘ID_BAND‘ (in the input RGS source\
+    \ file,\n        when ‘filemode‘=‘modify’) or ‘bandids‘, which is used to select\n\
+    \        the EPIC sources and to calculate the RATE value, transmitted\n     \
+    \   into the output RGS source list. Note that this used to be\n        ‘E_BANDn‘(=2)\
+    \ before Ver.6.0. If ‘filemode‘=‘modify’ and if the\n        input RGS source\
+    \ list has ‘E_BANDn‘ keywords, then they will be\n        preserved in the output\
+    \ RGS source list (i.e., both ‘E_BANDn‘\n        and ‘E_mBNDn‘ keywords may appear).\n\
+    \n    -   ‘E_FILTn‘: Similar to the ‘E_EXPRn‘ keyword, but this records\n    \
+    \    the value of the ‘FILTER‘ keyword in the EPIC file header.\n\n    The ‘SRCLIST‘\
+    \ table has the following columns:\n\n      Column name:      Data type:  Description:\n\
+    \      ---------------- ------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n\
+    \      ‘INDEX‘             int16     Source index number. Each source has a unique\
+    \ value, which Task: rgssources never alters.\n      ‘LABEL‘             string\
+    \    Label for the source. These values are also unique to each source. Only upper\
+    \ case is used. At present, label values can only be 20 characters or less in\
+    \ length. Trailing spaces are not allowed.\n      ‘RA‘                real32 \
+    \   J2000 right ascension in decimal degrees.\n      ‘DEC‘               real32\
+    \    J2000 declination in decimal degrees.\n      ‘RATE‘              real32 \
+    \   Counts per second.\n      ‘DELTA_DISP‘        real32    Offset on the sky,\
+    \ in the dispersion direction, of the source with respect to the pointing direction.\
+    \ Given in arcminutes.\n      ‘DELTA_XDSP‘        real32    Offset on the sky,\
+    \ in the cross-dispersion direction, of the source with respect to the pointing\
+    \ direction. Given in arcminutes.\n      ‘FOV_PHI‘           real32    This and\
+    \ the next column give the polar coordinates of ‘DELTA_DISP‘ and ‘FOV_PHI‘. Units\
+    \ for both are decimal degrees. ‘FOV_PHI‘ is the angle of the source position\
+    \ from the -ve dispersion axis towards the +ve cross-dispersion axis.\n      ‘FOV_R‘\
+    \             real32    \n      ‘CONFUSION‘         real32    This is a measure\
+    \ of how confused the source is with respect to the prime source. See subsection\
+    \ [confusion] for a description of how it is calculated. It is a dimensionless\
+    \ number.\n      ‘PROCESS‘            bool     This column is used by Task: rgsregions\
+    \ to flag those sources for which spectrum extraction regions should be calculated.\
+    \ This column is no longer set by Task: rgssources, though, so all values are\
+    \ written as false in principle. An exception is the case of ‘filemode‘=‘modify’;\
+    \ in that case the PROCESS column in the input RGS source list is in principle\
+    \ preserved. Another exception is the sources added by the user (‘addusersource‘=true),\
+    \ where the value of the command-line option ‘process‘ is written as it is in\
+    \ principle. In any case, if ‘filemode‘=‘modify’ and ‘changeattitude‘=true, all\
+    \ PROCESS values are forcibly written as false regardless of the value ‘process‘\
+    \ or PROCESS in the input RGS source list.\n      ‘BKG_EXCLUDE‘        bool  \
+    \   This column is used by Task: rgsregions to flag those sources which should\
+    \ be excluded from the background spectrum extraction region. This column is no\
+    \ longer set by Task: rgssources, so all values are written as false.\n      ‘FIXED_ON_SKY‘\
+    \       bool     This column flags those sources for which the positional information\
+    \ was derived from right ascension and declination. The only sources for which\
+    \ ‘FIXED_ON_SKY‘ is false are the attitude source and any user source supplied\
+    \ with ‘userstyle‘=‘wtatt’.\n\n      Column name:    Data type:  Description:\n\
+    \      -------------- ------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n\
+    \      ‘EPIC_FILE‘       int16     This gives the number of the ‘E_EXPRn‘, ‘E_CONTn‘,\
+    \ ‘E_mBNDn‘ (or ‘E_BANDn‘ before Ver.6.0) and ‘E_FILTn‘ keywords appropriate to\
+    \ the source if it has been derived from an EPIC source list. Eg, for ‘EPIC_FILE‘=3,\
+    \ the details of the original list from which this source came can be found from\
+    \ the keywords ‘E_EXPR3‘, ‘E_CONT3‘, ‘E_mBND3‘ and ‘E_FILT3‘.\n      ‘FLAG‘  \
+    \          int32     If non-zero, something goes wrong in the source. It is a\
+    \ binary (bit-type) form of representation for each cause – see the following\
+    \ table for detail (n.b., The representation of this FLAG column is entirely different\
+    \ from that in the input EPIC source list). Note that some of the checks may be\
+    \ bypassed if requested (by command-line parameters); for example if ‘enablefilter‘=false\
+    \ and ‘flagepicsrcoutoffov‘=false, no check for OUTOFFOV is carried out.\n\n \
+    \   The following is the description for the ‘FLAG‘ column:\n\n      Name    \
+    \      Bit  Description\n      ------------ ----- ---------------------------------------------------------------\n\
+    \      OUTOFFOV       0   The source is out of field of view.\n      CONFUSED\
+    \       1   The source may be confused with other source(s).\n      BADBAND[1]\
+    \     2   The energy band used (hence RATE) may be wrong.\n      WIDESRC     \
+    \   3   The source is greater than 90 degrees away from the pointing.\n\n    Note\
+    \ that the RGS source list set is also used to store the spectrum\n    extraction\
+    \ regions created by Task: rgsregions. These become\n    invalidated if the attitude\
+    \ is altered; in this case Task:\n    rgssources deletes them. See the algorithm\
+    \ (section\n    [rgssources:description:algorithm]) for details of the circumstances\n\
+    \    under which this occurs.\n\n    The RGS source list table is required to\
+    \ have 1 source whose\n    position is taken from the observation proposal, and\
+    \ 1 source whose\n    position is equal to the RGS attitude (stored in the dataset\
+    \ header\n    keywords ‘RA_PNT‘, ‘DEC_PNT‘ and ‘PA_PNT‘). The ‘LABEL‘ values of\n\
+    \    these two sources are PROPOSAL and ONAXIS respectively.\n\n[1] Since Ver.6.0,\
+    \ this flag is not set by rgssources.\n\n[ABSTRACT] rgssources\nThe task constructs\
+    \ a list of sources that are to be processed by RGS\npipeline.\n[DESCRIPTION]\
+    \ rgssources\n[ATTITUDE PARAMETERS.] rgssources\n[CCF.] rgssources\nTo access\
+    \ this, the user should set SAS_CCF in the usual way.\n[ADDING FURTHER SOURCES.]\
+    \ rgssources\n[FUTURE DEVELOPMENTS] rgssources\n-\n[CAL USAGE] rgssources\n- \
+    \  CAL_setState\n\n-   CAL_getMiscellaneousDataValue"
+- source_sentence: What are the possible warning messages listed in the excerpt?
+  sentences:
+  - 'General cross-correlation products
+    These PPS cross-correlation products list the names of all catalogues
+    searched (both around each EPIC position and in the whole EPIC field)
+    and describe the format of their output.
+    '
+  - 'This product is no longer made by the pipeline. A scientifically
+    meaningful flatfield image can not readily be constructed from onboard
+    flat-field images. A unit flatfield is considered to be adequate and so
+    creation of this product was dropped from the processing.
+    '
+  - "rgsregions\n## Errors\n\n \\label{rgsregions:description:errorconditions}\n\n\
+    \ **Error:** noExposureMaps.\n  }\n\n **Warning:** fractionalCoverage,\n  `xpsfexcl`,\
+    \ or `pdistincl`) was given a value greater\n  than zero but less than one, suggesting\
+    \ that the user has forgotten\n  that these parameters are specified as percentages.\n\
+    \  }\n\n **Warning:** protectedRegion\n\n **Warning:** emptyRegion\n\n"
+- source_sentence: What happens if the number of types and scopes provided is not
+    equal in cifremove?
+  sentences:
+  - "-   For each RGS detector there is a single file containing filtered\n    events\
+    \ from all CCDs.\n\n-   The structure of the FITS file is:\n\n    1.  Primary\
+    \ header with null primary array.\n\n    2.  A binary table extension containing\
+    \ event data\n        ( EXTNAME=’EVENTS’).\n\n    3.  Per CCD (m =1-9) a standard\
+    \ GTI extension (STDGTI0m).\n\n    4.  Per CCD (m) and per CCD readout node (n=0-1),\
+    \ a bad pixel\n        extension (BADPIXnm).\n\n    5.  Per CCD (m) and per CCD\
+    \ readout node (n), a rejected pixel\n        extension (REJPIXnm).\n\n    6.\
+    \  Per CCD (m) an exposure extension (EXPOSU0m).\n\n    7.  Per CCD (m) and per\
+    \ readout node (n), an exposure map extension\n        EXPMAPnm\n\n-   These files\
+    \ are identified using the keyword\n\n        CONTENT = 'RGS EVENT LIST'\n\n \
+    \   in the primary header.\n\n-   This is a product of class RGSEXP.\n\n-   The\
+    \ EVENTS extension comprises a binary table extension with the\n    following\
+    \ columns:\n\n      Name              Type             Description\n      -----------------\
+    \ ---------------- -----------------------------------------------\n      TIME\
+    \              8-byte REAL      Frame timestamp\n      FLAG              4-byte\
+    \ INTEGER   Event attribute flags\n      BETA              4-byte REAL      Uncorrected\
+    \ dispersion angle\n      XDSP              4-byte REAL      Uncorrected cross-dispersion\
+    \ angle\n      CHIPX             2-byte INTEGER   Chip X coordinate (pixel)\n\
+    \      CHIPY             2-byte INTEGER   Chip Y coordinate (pixel)\n      PHA\
+    \               2-byte INTEGER   Total telemetered energy\n      SHAPE       \
+    \      BYTE             Event shape identifier\n      GRADE             BYTE \
+    \            Total number of pixels\n      PI                2-byte INTEGER  \
+    \ Total corrected CCD event energy\n      CCDNR             BYTE             CCD\
+    \ ID number\n      BETA_CORR         4-byte REAL      Attitude corrected dispersion\
+    \ angle (radians)\n      XDSP_CORR         4-byte REAL      Attitude corrected\
+    \ cross-disp angle (radians)\n      M_LAMBDA          4-byte REAL      Wavelength\
+    \ spectral-order product\n      BETA_CHANNEL      2-byte INTEGER   BETA_CORR channel\n\
+    \      MLAMBDA_CHANNEL   2-byte INTEGER   M_LAMBDA channel\n      XDSP_CHANNEL\
+    \      2-byte INTEGER   XDISP_CORR channel\n\n-   Event times are specified in\
+    \ seconds after a reference time\n    specified in a header keyword (MJDREF).\n\
+    \n-   The STDGTI0m extension comprises a binary table extension with the\n   \
+    \ following columns:\n\n      Name    Type          Description\n      -------\
+    \ ------------- ------------------------------------------\n      START   8-byte\
+    \ REAL   GTI start time (s) since reference epoch\n      STOP    8-byte REAL \
+    \  GTI end time (s) since reference epoch\n\n-   The BADPIXnm extension contains\
+    \ a binary table extension with the\n    following columns:\n\n      Name    \
+    \  Type             Description\n      --------- ---------------- --------------------------------\n\
+    \      CHIPX     2-byte INTEGER   Chip X coordinate (pixel)\n      CHIPY     2-byte\
+    \ INTEGER   Chip Y coordinate (pixel)\n      YEXTENT   2-byte INTEGER   Extent\
+    \ of badness in Y (pixel)\n      TYPE      2-byte INTEGER   Type of badness\n\
+    \      BADFLAG   2-byte INTEGER   Data source flag\n\n-   The REJPIXnm extension\
+    \ contains a binary table extension with the\n    following columns:\n\n     \
+    \ Name    Type             Description\n      ------- ---------------- ---------------------------\n\
+    \      FRAME   4-byte INTEGER   Frame identifier\n      FLAG    4-byte INTEGER\
+    \   Event attribute flags\n      CHIPX   2-byte INTEGER   Chip X coordinate (pixel)\n\
+    \      CHIPY   2-byte INTEGER   Chip Y coordinate (pixel)\n\n-   The EXPOSU0m\
+    \ extension contains a binary table extension with the\n    following columns:\n\
+    \n      Name       Type             Description\n      ---------- ----------------\
+    \ ---------------------------------------------\n      FRAME      4-byte INTEGER\
+    \   Frame identifier\n      NLOSTEVT   2-byte INTEGER   Number of lost events\
+    \ in frame\n      ABORTFLG   2-byte INTEGER   Abort frame flag\n      FLAG   \
+    \    4-byte INTEGER   Frame attributes\n      TIMEDEL    4-byte REAL      Frame\
+    \ integration time (s)\n      TIME       8-byte REAL      Seconds since MJDREF\n\
+    \      FRACEXP0   4-byte REAL      Exposure fraction node 0\n      FRACEXP1  \
+    \ 4-byte REAL      Exposure fraction node 1\n      ASPCDSP    4-byte REAL    \
+    \  Aspect correction applied to BETA (radians)\n      ASPCXDSP   4-byte REAL \
+    \     Aspect correction applied to XDSP (radians)\n\n-   The EXPMAPnm extension\
+    \ is an image extension containing the exposure\n    map for CCD m, node n.\n\n\
+    -   This is a science product suitable for use in further data analysis.\n\n-\
+    \   There will be a single event file per exposure. The event lists will\n   \
+    \ typically be 10 MB uncompressed\n"
+  - "The source list for a grism exposure represents a list of detections of\nall\
+    \ the zeroth order and/or first-order spectrum features in the OSW\nimage.\n\n\
+    -   The source detection list is supplied in FITS format.\n\n-   These files are\
+    \ identified using the keyword\n\n        CONTENT = 'OM OSW GRISM SOURCE LIST'\n\
+    \n    in the primary header.\n\n-   This is a product of class OMSW.\n\n-   The\
+    \ OGIP filetype is defined by the keywords\n\n        HDUCLASS= 'OGIP    '   \
+    \        / Format conforms to OGIP/GSFC conventions\n        HDUCLAS1= 'SRCLIST\
+    \ '           / File contains a source list\n\n    in the primary header.\n\n\
+    -   The data extension (EXTNAME = ’SRCLIST’) contains a binary table\n    with\
+    \ the following columns:\n\n      Name           Type             Description\n\
+    \      -------------- ---------------- --------------------------------------------------------------\n\
+    \      SRCNUM         4-byte INTEGER   Source number\n      XPOS           4-byte\
+    \ REAL      X-pixel position\n      YPOS           4-byte REAL      Y-pixel position\n\
+    \      POSERR         4-byte REAL      Positional error (pixels)\n      SPB_COILOSS\
+    \    4-byte REAL      Coincidence loss correction in source+background\n     \
+    \ BK_COILOSS     4-byte REAL      Coincidence loss correction in background\n\
+    \      FWHM_MAJ       4-byte REAL      Source FWHM (ellipse major axis)\n    \
+    \  FWHM_MAJ_ERR   4-byte REAL      Source FWHM (major axis) error\n      FWHM_MIN\
+    \       4-byte REAL      Source FWHM (ellipse minor axis)\n      FWHM_MIN_ERR\
+    \   4-byte REAL      Source FWHM (minor axis) error\n      PA             4-byte\
+    \ REAL      Position angle of ellipse major axis\n      PA_ERR         4-byte\
+    \ REAL      Source position angle error\n      QFLAG          16-bit INTEGER \
+    \  Quality flag\n      CFLAG          8-bit INTEGER    Confusion flag\n      EFLAG\
+    \          8-bit INTEGER    Extension flag\n      SPECTR_ID      4-byte INTEGER\
+    \   Spectrum identifier\n      REL2SRCNUM     4-byte INTEGER   Identifies related\
+    \ spectrum and zeroth order feature entries\n\n-   This is a science product.\
+    \ The OM OSW source list is the first stage\n    analysis of the OSW for grism\
+    \ data.\n\n-   The grism source lists is notably distinct from the normal imaging\n\
+    \    and FAST source lists because many entries are the detections of the\n  \
+    \  spectra themselves, not just the zeroth order features that map to\n    the\
+    \ objects on the sky. At the current time, the SSC pipeline does\n    not insert\
+    \ celestial coordinates (RA and DEC) in the file though\n    this is expected\
+    \ to change in a future pipeline release. The ellipse\n    parameters of the detections\
+    \ largely reflect dispersion in the\n    spectrum and zeroth order features, rather\
+    \ than intrinsic extension\n    of the sky object.\n\n-   There is one file per\
+    \ OSW per exposure. Each file is typically 24KB\n    uncompressed.\n"
+  - "cifremove\n     \n      cif=parameter(calindexset)\n      if(parameterCount(types)\
+    \ != parameterCount(scopes)){\n        error(ParameterCountMismatch)\n      }\n\
+    \      foreach(type-scope pair){\n        if(! cif.has(type, scope)){\n      \
+    \    warning(NoMatchingCcfConstituent)\n        } else {\n          cif.remove_entry(type,\
+    \ scope)  \n        }\n      }\n"
+- source_sentence: What are the task parameters of binadapt?
+  sentences:
+  - "backscale\n## Parameters\n\n\\label{backscale:description:parameters}\n\n **spectrumset**\
+    \ (Mandatory): \n  Name of the input file\n  \n(Type: string, Default: spectrum.ds,\
+    \ Range: )\n **badpixlocation** (Optional): \n  Name of the file containing the\
+    \ bad pixels, initially this\n  is the event file.\n  \n(Type: string, Default:\
+    \ notSpecified, Range: )\n **withbadpixcorr** (Optional): \n  Whether to use bad\
+    \ pixels and chip gaps in the calculation.\n  \n(Type: boolean, Default: yes,\
+    \ Range: )\n **useodfatt** (Optional): \n  Whether to use the ODF attitude file\
+    \ to construct position info.\n  \n(Type: boolean, Default: no, Range: )\n **ignoreoutoffov**\
+    \ (Optional): \n  Whether area outside the field of view should be included\n\
+    \  in the backscale calculation.\n  \n(Type: boolean, Default: yes, Range: )\n\
+    \ **withbadpixres** (Optional): \n  Whether a grid resolution has been specified\
+    \ on the command line.\n  If not set then the task uses the default badpixelresolution\
+    \ set by\n  the Task: arfgen\n(Type: boolean, Default: no, Range: ) task.\n  \n\
+    \ **badpixelresolution** (Optional): \n  The grid resolution to use when calculating\
+    \ the area. If set then this overrides\n  the value used internally by Task: arfgen\n\
+    (Type: float, Default: , Range: ). A value such as 2.0, will result in \n  a faster\
+    \ execution time at the expense of accuracy.\n  \n%   **detmaptype** (Optional):\
+    \ detmapfile.ds:\n(Type: choice, Default: flat, Range: dataset flat}\n%    {\n\
+    %    This is the detector map type. It should be left as the default\n%    'flat'\
+    \ in all cases except where the source region contains a\n%    spatial mask. In\
+    \ this case it should be set to 'dataset' and\n%    an image of the extraction\
+    \ region given in `detmaparray`.\n%    }\n\n%   \\optparm{detmaparray){array}{none}\n\
+    %    {\n%    Name of detector map dataset and array in the DAL compound notation.\n\
+    %    Only used if `detmaptype` is set to `dataset`.\n%    }\n\n[INPUT FILES]\n\
+    backscale\n-   an EPIC spectrum file containing a datasubspace definition\n\n\
+    -   an optional second file containing the bad pixel extensions\n\n[OUTPUT FILES]\n\
+    backscale\n-   The input spectrum is modified\n\n[ABSTRACT] backscale\nA tool\
+    \ for calculating and writing the BACKSCAL keyword in EPIC spectra.\n[SPATIAL\
+    \ MASKS] backscale\n[CAL USAGE] backscale\nThe metatask doesn’t use the CAL directly\
+    \ but calls Task: arfgen which\nuses the routine CAL_onCcd to determine whether\
+    \ a pixel lies on a CCD.\n[FUTURE DEVELOPMENTS] backscale\nThere is scope for\
+    \ reorganising Task: arfgen to make the BACKSCAL\ncalculation a bit faster."
+  - "binadapt\n## Parameters\n\n**prefix** (Optional): Input inst+exposure ID (1S001,\
+    \ S003) OR “comb” to\nuse combimage inputs.\n\n(Type: string, Default: comb, Range:\
+    \ ) The program defaults to deriving\na filename of the form comb-elow-ehigh.fits.\
+    \ If using singular expids,\nenter that as the prefix.\n\n**elow** (Optional):\
+    \ Lower energy limit for the energy band in eV.\n\n(Type: int, Default: 350, Range:\
+    \ 0 <  = elow <  = 11999) **ehigh**\n(Optional): Upper energy limit for the energy\
+    \ band in eV.\n\n(Type: int, Default: 1100, Range: 1 <  = ehigh <  = 12000)\n\
+    **withpartbkg** (Optional): Particle background control, \"yes\" to\nsubtract\
+    \ the model (QPB) particle background image.\n\n(Type: bool, Default: no, Range:\
+    \ T/F) **withspbkg** (Optional): Soft\nproton background control, \"yes\" to subtract\
+    \ the soft proton background\nimage.\n\n(Type: bool, Default: no, Range: T/F)\
+    \ **withswcxbkg** (Optional): Solar\nwind charge exchange background control,\
+    \ \"yes\" to subtract the SWCX\nbackground image.\n\n(Type: bool, Default: no,\
+    \ Range: T/F) **withmask** (Optional): Apply\nadditional masking using input image?\n\
+    \n(Type: bool, Default: no, Range: T/F) **maskfile** (Optional): The\nfilename\
+    \ for an image to provide additional masking if desired.\n\n(Type: dataset, Default:\
+    \ default, Range: ) If left blank (AND\nwithmask=T), binadapt will derive a mask\
+    \ filename based on other\nparameters. The mask images must be the same size and\
+    \ projection as the\nother images. Since masks can come from many sources, it\
+    \ is recommended\nthe user enter withmask=T maskfile=yourmaskfile together.\n\
+    **withbinning** (Optional): Perform binning?\n\n(Type: bool, Default: yes, Range:\
+    \ T/F) **binfactor** (Optional): Binning\nfactor.\n\n(Type: int, Default: 2, Range:\
+    \ 1 <  = binfactor) **withsmoothing**\n(Optional): Perform smoothing?\n\n(Type:\
+    \ bool, Default: yes, Range: T/F) **smoothcounts** (Optional):\nSmoothing factor.\n\
+    \n(Type: int, Default: 50, Range: 1 <  = smoothcounts <  = 100)\n**maskthresh**\
+    \ (Optional): The scale factor for excluding regions from\nthe smoothing based\
+    \ on a mask image.\n\n(Type: real, Default: 0.02, Range: 0.001 <  = maskthresh)\
+    \ In the default\nmode the average exposure is calculated and then any pixel with\
+    \ exposure\nless than fraction*average value is excluded.\n[INPUT FILES]\nbinadapt\n\
+    Binadapt will create filenames based on parameters input, especially\nprefix,\
+    \ elow, and ehigh\n\nThe user can choose to enter either a prefix designating\
+    \ the\ninstrument + expid, e.g. 1S001, 2S002, S003 OR the string “comb” to use\n\
+    products from the task combimage. Either will initiate deriving all the\ninput\
+    \ filenames based on other input parameters. If no prefix is given,\nbinadapt\
+    \ will derive an input filename of the form:\ncomb-elow-ehigh.fits, the default\
+    \ output from combimage.\n\nValid input filenames derived are, in the case of\
+    \ a prefix being\nentered, e.g.:\n\n    binadapt prefix=1S001 withspbkg=T withpartbkg=T\
+    \ withswcx=T withmask=T\n    maskfile=mymaskimage.fits elow=400 ehigh=2000\n\n\
+    mos1S001-fovimsky-400-2000.fits (from mosspectra)\nmos1S001-expimsky-400-2000.fits\
+    \ (from mosspectra)\nmos1S001-bkgimsky-400-2000.fits (from mosback)\nmos1S001-swcximsky-400-2000.fits\
+    \ (from swcx)\nmos1S001-protimsky-400-2000.fits (from proton)\nmos1S001-maskimsky-400-2000.fits\
+    \ (from emask, et al)\n[NOTE: binadapt will, by default, create a mask file name\
+    \ as above, but\nsince masks can come from different tasks, if you have a mask\
+    \ NOT of\nthat style, simply set withmask=T and maskfile=yourmaskname to override\n\
+    the default]\n\nValid input filenames derived are, in the case of a comb being\
+    \ entered,\ne.g.:\n\n    binadapt prefix=comb withspbkg=T withpartbkg=T withswcx=T\
+    \ withmask=T\n    maskfile=mymaskimage.fits elow=400 ehigh=2000\n\nAll of these\
+    \ files are produced by the task combimage after running\nany/all of mosspectra/\
+    \ mosback/ swcx/ proton comb-fovimsky-400-2000.fits\ncomb-expimsky-400-2000.fits\n\
+    comb-bkgimsky-400-2000.fits\ncomb-swcximsky-400-2000.fits\ncomb-protimsky-400-2000.fits\n\
+    comb-maskimsky-400-2000.fits (from emask, et al\nNOTE: typically, a mask produced\
+    \ by combimage will have the name style\nas above, but if you rename your files,\
+    \ simply set withmask=T and\nmaskfile=yourmaskname to override the default\n\n\
+    [OUTPUT FILES]\nbinadapt\nIf withsmoothing=T, binadapt creates an adaptively smoothed,\
+    \ exposure\ncorrected, and background subtracted (any selected) image in SKY coords:\n\
+    \nmos1S001-adaptimsky-350-1100.fits\n\nA 900x900 Real32 image of the smoothing\
+    \ FWHM:\n\nmos1S001-sizeimsky-350-1100.fits\n\nNote: both of the above are also\
+    \ binned if withbinning=T\n\nA QDP plot file of the radial profile of the data\
+    \ for the selected\nenergy band (elow and ehigh) of the selected region:\n\nmos1S001-radfilt-350-1100.qdp\n\
+    \nA histogram of the smoothing FWHM:\n\nmos1S001-size-350-1100.qdp\n\nIf withsmoothing=F,\
+    \ only these binned, exposure corrected, and\nbackground subtracted (any selected)\
+    \ images are created:\n\nThe binned count rate uncertainty image for the selected\
+    \ energy band\n(elow and ehigh) of the selected region in SKY coordinates:\n\n\
+    mos1S001-sigimsky-350-1100.fits\n\nand the binned count rate image for the selected\
+    \ energy band (elow and\nehigh) of the selected region in sky coordinates:\n\n\
+    mos1S001-rateimsky-350-1100.fits\n\nA QDP plot file of the radial profile of the\
+    \ data for the selected\nenergy band (elow and ehigh) of the selected region:\n\
+    \nmos1S001-radfilt-350-1100.qdp\n\n[FUTURE DEVELOPMENTS] binadapt"
+  - "-   This extension contains the exposed fraction of each frame per CCD\n    (in\
+    \ the pn the frame time is constant, and is therefore not included\n    in this\
+    \ extension).\n\n-   There is one extension per CCD in the relevant mode (IMAGING\
+    \ or\n    TIMING) during the exposure.\n\n-   The following keywords are present\
+    \ in all cases (example values\n    shown):\n\n        CCDID   =             \
+    \       1 / CCD Identifier\n        FRMTIME =                 2600 / Nominal frame\
+    \ integration time\n        WINDOWX0=                    1 / X coordinate of bottom\
+    \ left corner of window\n        WINDOWY0=                    1 / Y coordinate\
+    \ of bottom left corner of window\n        WINDOWDX=                  600 / Size,\
+    \ along x-axis, of window\n        WINDOWDY=                  600 / Size, along\
+    \ y-axis, of window\n\n-   In addition, the following keywords are present in\
+    \ EPIC pn EXPOSUnn\n    extensions:\n\n         QUADRANT=                    0\
+    \ / Quadrant Identifier\n         QUADMODE=                    0 / Quadrant mode\n\
+    \         CCDMODE =                    0 / CCD mode\n         SINGLES =      \
+    \         588287 / number of single events\n         DOUBLES =               \
+    \ 67309 / number of double events\n         TRIPLES =                 2920 / number\
+    \ of triple events\n         QUADRUPL=                 4607 / number of quadruple\
+    \ events\n         NOTRECEV=                 3958 / number of not recognized events\n\
+    \         NOTRECPA=               171641 / number of not recognized patterns\n\
+    \         MAXPAT  =                  263 / maximum pattern size\n         MIPS\
+    \    =                    3 / number of MIPs found\n         RECPHOTO=       \
+    \        664123 / number of recognized photons\n         ANALYSED=           \
+    \    924737 / number of analysed events\n\n-   For both imaging and timing mode\
+    \ MOS event lists this extension\n    contains the following columns:\n\n    \
+    \  Name      Type          Description\n      --------- ------------- -------------------------------------------------\n\
+    \      TIME      8-byte REAL   Frame start time (seconds since reference time)\n\
+    \      TIMEDEL   4-byte REAL   Duration of frame time (seconds)\n      FRACEXP\
+    \   4-byte REAL   Fractional exposure of frame\n\n-   For both imaging and timing\
+    \ mode pn event lists this extension\n    contains the following columns:\n\n\
+    \      Name      Type          Description\n      --------- ------------- -------------------------------------------------\n\
+    \      TIME      8-byte REAL   Frame start time (seconds since reference time)\n\
+    \      FRACEXP   4-byte REAL   Fractional exposure of frame\n"
+- source_sentence: In nearly all cases, how many source and background region spectra
+    are supplied for the RGS?
+  sentences:
+  - "Parameter dialogs\n\nEach task has an associated parameter dialog window. These\
+    \ individual\ntask GUIs are used to enter the values of the different task parameters\n\
+    and to . The parameter dialog windows are opened by double-clicking any\nof the\
+    \ tasks listed under the \"task\" column.\n\nThe following parameter dialog window\
+    \ (figure [fig:gui:parameterdialog])\nillustrates some of the basic parameter\
+    \ types. Each parameter type has a\ncorresponding widget type. For example, a\
+    \ boolean parameter is entered\nusing a check-box (withexposure); a choice parameter\
+    \ is entered by using\na pop-up menu that allows to select from a set of options\
+    \ (sampling); a\nfilename parameter is entered as a string (imagesets), with the\
+    \ option\nof popping up a file browser by pressing the button with the folder\
+    \ icon\n(see § [gui:browser]).\n\nIf the task has a large number of parameters,\
+    \ the dialog window may have\nscroll-bars. The scroll bars will disappear if the\
+    \ size of the dialog\nwindow is increased sufficiently.\n\nFurther information\
+    \ on a parameter can be obtained by placing the cursor\nover the parameter widget.\
+    \ This causes a yellow tool-tip to pop-up if\nthe parameter file defines a prompt\
+    \ field for the parameter.\n\nThe parameter dialog has the following buttons:\n\
+    \n  ---------- ------------------------------------------------------------------------\n\
+    \  Run        Run the task with the selected parameters\n  Cancel     Close the\
+    \ parameter dialog window without running the task or changing\n             the\
+    \ parameters\n  Save       Saves the value of the parameters\n  Defaults   Reset\
+    \ the parameters to their default values\n  ---------- ------------------------------------------------------------------------\n\
+    \nWhen a task has been run, the parameter values are retained until the\nnext\
+    \ time that the task is run (within the same session). The Defaults\nbutton may\
+    \ be used to reset the parameters of a task to their default\nvalues. The \"Task\"\
+    \ menu in the main SAS GUI provides an option \"Revert\nto defaults\" to reset\
+    \ all the parameters of all the tasks to their\ndefaults.\n"
+  - "-   This extension gives the good time intervals for the event list.\n\n-   There\
+    \ is one extension per CCD in the relevant mode (IMAGING or\n    TIMING) during\
+    \ the exposure.\n\n-   The following keywords are present:\n\n        HDUCLASS=\
+    \ 'OGIP    '           / format conforms to OGIP standard\n        HDUCLAS1= 'GTI\
+    \     '           / table contains Good Time Intervals\n        HDUCLAS2= 'STANDARD'\
+    \           / standard Good Time Interval table\n\n-   This extension contains\
+    \ the following columns:\n\n      Name    Type          Description\n      -------\
+    \ ------------- --------------------------------\n      START   8-byte REAL  \
+    \ seconds (since reference time)\n      STOP    8-byte REAL   seconds (since reference\
+    \ time)\n"
+  - 'RGS spectral products
+    This section describes the spectral data products to be generated from
+    pointed observations.
+    Source and background region spectra and a background-subtracted source
+    spectrum are supplied for the brightest point sources in the RGS (in
+    nearly all cases this is just one source). Spectral response matrices
+    are also supplied.
+    '
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+---
+# SentenceTransformer based on nomic-ai/modernbert-embed-base
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
+- **Maximum Sequence Length:** 8192 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("lochhonest/modernbert-finetuned-for-sas")
+# Run inference
+sentences = [
+    'In nearly all cases, how many source and background region spectra are supplied for the RGS?',
+    'RGS spectral products\n\nThis section describes the spectral data products to be generated from\npointed observations.\n\nSource and background region spectra and a background-subtracted source\nspectrum are supplied for the brightest point sources in the RGS (in\nnearly all cases this is just one source). Spectral response matrices\nare also supplied.\n',
+    "-   This extension gives the good time intervals for the event list.\n\n-   There is one extension per CCD in the relevant mode (IMAGING or\n    TIMING) during the exposure.\n\n-   The following keywords are present:\n\n        HDUCLASS= 'OGIP    '           / format conforms to OGIP standard\n        HDUCLAS1= 'GTI     '           / table contains Good Time Intervals\n        HDUCLAS2= 'STANDARD'           / standard Good Time Interval table\n\n-   This extension contains the following columns:\n\n      Name    Type          Description\n      ------- ------------- --------------------------------\n      START   8-byte REAL   seconds (since reference time)\n      STOP    8-byte REAL   seconds (since reference time)\n",
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 3,619 training samples
+* Columns: <code>anchor</code> and <code>positive</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                           | positive                                                                             |
+  |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                               |
+  | details | <ul><li>min: 2 tokens</li><li>mean: 15.7 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 411.84 tokens</li><li>max: 3755 tokens</li></ul> |
+* Samples:
+  | anchor                                                                     | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+  |:---------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What is the purpose of the document described in the preface?</code> | <code>Preface<br><br>This is the reference document describing the individual XMM-Newton<br>Survey Science Centre (SSC) data product files. It is intended to be of<br>use to software developers, archive administrators and to scientists<br>analysing XMM-Newton data. Please see the SSC data products Interface<br>Control Document (XMM-SOC-ICD-0006-SSC, issue 4.0) for a description of<br>the product group files and other related files that are sent to the<br>SOC.<br><br>This version (4.3) includes changes related to the upgrade to SAS16.0 in<br>the processing pipeline originally developped in 2012 to uniformly<br>process all the XMM data at that time, from which the 3XMM catalogue was<br>derived. Revisions and additions since version 4.2 are identified by<br>change bars at the right of each page.<br><br>This document will continue to evolve through subsequent issues, under<br>indirect control from the SAS and SSC configuration control boards.<br><br>This document is the result of the work of many people. Contributors<br>have included:<br><br>Hermann Brunner, G...</code> |
+  | <code>What version of the document is described in the preface?</code>     | <code>Preface<br><br>This is the reference document describing the individual XMM-Newton<br>Survey Science Centre (SSC) data product files. It is intended to be of<br>use to software developers, archive administrators and to scientists<br>analysing XMM-Newton data. Please see the SSC data products Interface<br>Control Document (XMM-SOC-ICD-0006-SSC, issue 4.0) for a description of<br>the product group files and other related files that are sent to the<br>SOC.<br><br>This version (4.3) includes changes related to the upgrade to SAS16.0 in<br>the processing pipeline originally developped in 2012 to uniformly<br>process all the XMM data at that time, from which the 3XMM catalogue was<br>derived. Revisions and additions since version 4.2 are identified by<br>change bars at the right of each page.<br><br>This document will continue to evolve through subsequent issues, under<br>indirect control from the SAS and SSC configuration control boards.<br><br>This document is the result of the work of many people. Contributors<br>have included:<br><br>Hermann Brunner, G...</code> |
+  | <code>What is the main change in version 4.3 of the document?</code>       | <code>Preface<br><br>This is the reference document describing the individual XMM-Newton<br>Survey Science Centre (SSC) data product files. It is intended to be of<br>use to software developers, archive administrators and to scientists<br>analysing XMM-Newton data. Please see the SSC data products Interface<br>Control Document (XMM-SOC-ICD-0006-SSC, issue 4.0) for a description of<br>the product group files and other related files that are sent to the<br>SOC.<br><br>This version (4.3) includes changes related to the upgrade to SAS16.0 in<br>the processing pipeline originally developped in 2012 to uniformly<br>process all the XMM data at that time, from which the 3XMM catalogue was<br>derived. Revisions and additions since version 4.2 are identified by<br>change bars at the right of each page.<br><br>This document will continue to evolve through subsequent issues, under<br>indirect control from the SAS and SSC configuration control boards.<br><br>This document is the result of the work of many people. Contributors<br>have included:<br><br>Hermann Brunner, G...</code> |
+* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "get_similarity"
+  }
+  ```
+### Evaluation Dataset
+#### Unnamed Dataset
+* Size: 30 evaluation samples
+* Columns: <code>anchor</code> and <code>positive</code>
+* Approximate statistics based on the first 30 samples:
+  |         | anchor                                                                           | positive                                                                             |
+  |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                               |
+  | details | <ul><li>min: 8 tokens</li><li>mean: 16.0 tokens</li><li>max: 24 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 642.47 tokens</li><li>max: 6152 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                                             | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+  |:-------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What is the purpose of the PPS cross-correlation products?</code>                                            | <code>General cross-correlation products<br><br>These PPS cross-correlation products list the names of all catalogues<br>searched (both around each EPIC position and in the whole EPIC field)<br>and describe the format of their output.<br></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+  | <code>What are the task parameters of rgssources?</code>                                                           | <code>rgssources<br>## Parameters<br><br>  \label{rgssources:description:parameters}<br>  <br>  **filemode}	{modify** (Optional): no<br>(Type: <br>    Controls whether the task opens a previous source list for editing or creates a new one.<br>    }<br>  \optparm{changeprime}	{no}	{boolean}	{yes|no, Default: string}	{modify|create, Range: <br>    Only active in `filemode`=`modify'. Unless this parameter is set, the previous prime source index number is retained.<br>    }<br>  \optparm{changeattitude)	{boolean}	{yes|no}{<br>    Only active in `filemode`=`modify'. Unless this parameter is set, the previous attitude (stored in the header) is retained.<br>    }<br>  **srclist}	{rgsset.ds** (Mandatory): yes<br>(Type: <br>    The name of the rgs source list. If `filemode`=`create', the output is written to this file. If there is an existing file of this name, it will be overwritten unless SAS\_CLOBBER is unset. If `filemode`=`modify', the task looks for an existing source list of this name and modifies it.<br>  }<br>  **instexpid}	{}	{string}	{, Default:...</code> |
+  | <code>How many stars were used in the U-filter analysis for the G153 pointing to create the distortion map?</code> | <code>OM distortion<br><br>The  OM<br>(http://www.cosmos.esa.int/web/xmm-newton/technical-details-om) optics,<br>filters and (primarily) the detector system result in a certain amount<br>of image distortion. This effect can be corrected with a “distortion<br>map”, by comparing the expected position with the measured position for<br>a large number of stars in the OM<br>(http://www.cosmos.esa.int/web/xmm-newton/technical-details-om) field of<br>view. A U-filter analysis has been performed on the G153 pointing with<br>813 stars. The effect of applying this correction is shown in<br>Fig. [fig:uhb:distmap]. A positional r.m.s. accuracy of 0.5 − 1.5 arcsec<br>is obtained. The distortion map has been entered into the appropriate<br>CCF file and is used in http://www.cosmos.esa.int/web/xmm-newton/sas<br>(http://www.cosmos.esa.int/web/xmm-newton/sas).<br></code>                                                                                                                                                                                                                 |
+* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "get_similarity"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 4
+- `num_train_epochs`: 2
+- `lr_scheduler_type`: constant
+- `warmup_ratio`: 0.1
+- `bf16`: True
+- `batch_sampler`: no_duplicates
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 4
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 2
+- `max_steps`: -1
+- `lr_scheduler_type`: constant
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: True
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+| Epoch  | Step | Training Loss | Validation Loss |
+|:------:|:----:|:-------------:|:---------------:|
+| 0.2203 | 50   | 0.2209        | -               |
+| 0.4405 | 100  | 0.1635        | 0.0402          |
+| 0.6608 | 150  | 0.1759        | -               |
+| 0.8811 | 200  | 0.1674        | 0.1307          |
+| 1.1013 | 250  | 0.1134        | -               |
+| 1.3216 | 300  | 0.0809        | 0.0441          |
+| 1.5419 | 350  | 0.0571        | -               |
+| 1.7621 | 400  | 0.077         | 0.0268          |
+| 1.9824 | 450  | 0.0557        | -               |
+### Framework Versions
+- Python: 3.10.14
+- Sentence Transformers: 3.4.1
+- Transformers: 4.48.2
+- PyTorch: 2.6.0+cu124
+- Accelerate: 1.3.0
+- Datasets: 3.3.1
+- Tokenizers: 0.21.0
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### CachedMultipleNegativesRankingLoss
+```bibtex
+@misc{gao2021scaling,
+    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
+    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
+    year={2021},
+    eprint={2101.06983},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "_name_or_path": "nomic-ai/modernbert-embed-base",
+  "architectures": [
+    "ModernBertModel"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 50281,
+  "classifier_activation": "gelu",
+  "classifier_bias": false,
+  "classifier_dropout": 0.0,
+  "classifier_pooling": "mean",
+  "cls_token_id": 50281,
+  "decoder_bias": true,
+  "deterministic_flash_attn": false,
+  "embedding_dropout": 0.0,
+  "eos_token_id": 50282,
+  "global_attn_every_n_layers": 3,
+  "global_rope_theta": 160000.0,
+  "gradient_checkpointing": false,
+  "hidden_activation": "gelu",
+  "hidden_size": 768,
+  "initializer_cutoff_factor": 2.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 1152,
+  "layer_norm_eps": 1e-05,
+  "local_attention": 128,
+  "local_rope_theta": 10000.0,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "mlp_dropout": 0.0,
+  "model_type": "modernbert",
+  "norm_bias": false,
+  "norm_eps": 1e-05,
+  "num_attention_heads": 12,
+  "num_hidden_layers": 22,
+  "pad_token_id": 50283,
+  "position_embedding_type": "absolute",
+  "reference_compile": true,
+  "repad_logits_with_grad": false,
+  "sep_token_id": 50282,
+  "sparse_pred_ignore_index": -100,
+  "sparse_prediction": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.48.2",
+  "vocab_size": 50368
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.4.1",
+    "transformers": "4.48.2",
+    "pytorch": "2.6.0+cu124"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b9a52371c20459c83ab3b9e4dcba56b63c2b89ee64f564bceda45edf14a5516
+size 596070136

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 8192,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,945 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "|||IP_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50277": {
+      "content": "|||EMAIL_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50278": {
+      "content": "|||PHONE_NUMBER|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50279": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50280": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50281": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50282": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50283": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50284": {
+      "content": "[MASK]",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50285": {
+      "content": "[unused0]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50286": {
+      "content": "[unused1]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50287": {
+      "content": "[unused2]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50288": {
+      "content": "[unused3]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50289": {
+      "content": "[unused4]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50290": {
+      "content": "[unused5]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50291": {
+      "content": "[unused6]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50292": {
+      "content": "[unused7]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50293": {
+      "content": "[unused8]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50294": {
+      "content": "[unused9]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50295": {
+      "content": "[unused10]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50296": {
+      "content": "[unused11]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50297": {
+      "content": "[unused12]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50298": {
+      "content": "[unused13]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50299": {
+      "content": "[unused14]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50300": {
+      "content": "[unused15]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50301": {
+      "content": "[unused16]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50302": {
+      "content": "[unused17]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50303": {
+      "content": "[unused18]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50304": {
+      "content": "[unused19]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50305": {
+      "content": "[unused20]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50306": {
+      "content": "[unused21]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50307": {
+      "content": "[unused22]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50308": {
+      "content": "[unused23]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50309": {
+      "content": "[unused24]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50310": {
+      "content": "[unused25]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50311": {
+      "content": "[unused26]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50312": {
+      "content": "[unused27]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50313": {
+      "content": "[unused28]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50314": {
+      "content": "[unused29]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50315": {
+      "content": "[unused30]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50316": {
+      "content": "[unused31]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50317": {
+      "content": "[unused32]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50318": {
+      "content": "[unused33]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50319": {
+      "content": "[unused34]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50320": {
+      "content": "[unused35]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50321": {
+      "content": "[unused36]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50322": {
+      "content": "[unused37]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50323": {
+      "content": "[unused38]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50324": {
+      "content": "[unused39]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50325": {
+      "content": "[unused40]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50326": {
+      "content": "[unused41]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50327": {
+      "content": "[unused42]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50328": {
+      "content": "[unused43]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50329": {
+      "content": "[unused44]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50330": {
+      "content": "[unused45]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50331": {
+      "content": "[unused46]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50332": {
+      "content": "[unused47]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50333": {
+      "content": "[unused48]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50334": {
+      "content": "[unused49]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50335": {
+      "content": "[unused50]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50336": {
+      "content": "[unused51]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50337": {
+      "content": "[unused52]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50338": {
+      "content": "[unused53]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50339": {
+      "content": "[unused54]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50340": {
+      "content": "[unused55]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50341": {
+      "content": "[unused56]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50342": {
+      "content": "[unused57]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50343": {
+      "content": "[unused58]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50344": {
+      "content": "[unused59]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50345": {
+      "content": "[unused60]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50346": {
+      "content": "[unused61]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50347": {
+      "content": "[unused62]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50348": {
+      "content": "[unused63]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50349": {
+      "content": "[unused64]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50350": {
+      "content": "[unused65]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50351": {
+      "content": "[unused66]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50352": {
+      "content": "[unused67]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50353": {
+      "content": "[unused68]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50354": {
+      "content": "[unused69]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50355": {
+      "content": "[unused70]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50356": {
+      "content": "[unused71]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50357": {
+      "content": "[unused72]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50358": {
+      "content": "[unused73]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50359": {
+      "content": "[unused74]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50360": {
+      "content": "[unused75]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50361": {
+      "content": "[unused76]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50362": {
+      "content": "[unused77]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50363": {
+      "content": "[unused78]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50364": {
+      "content": "[unused79]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50365": {
+      "content": "[unused80]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50366": {
+      "content": "[unused81]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50367": {
+      "content": "[unused82]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "[UNK]"
+}