Push model using huggingface_hub.

Browse files

Files changed (3) hide show

README.md +244 -0
definition.json +1 -0
parameters +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,244 @@

+---
+language:
+- en
+license: apache-2.0
+library_name: llm2ner
+base_model: EleutherAI/pythia-14m
+tags:
+- ner
+- span-detection
+- llm
+- pytorch
+pipeline_tag: token-classification
+model_name: ToMMeR-pythia-14m_L3_R64
+source: https://github.com/VictorMorand/llm2ner
+paper: https://arxiv.org/abs/2510.19410
+---
+# ToMMeR-pythia-14m_L3_R64
+ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.
+## Checkpoint Details
+| Property  | Value |
+|-----------|-------|
+| Base LLM  | `EleutherAI/pythia-14m` |
+| Layer     | 3|
+| #Params   | 16.5K |
+# Usage
+## Installation
+Our code can be installed with pip+git, Please visit the [repository](https://github.com/VictorMorand/llm2ner) for more details.
+```bash
+pip install git+https://github.com/VictorMorand/llm2ner.git
+```
+## Fancy Outputs
+```python
+import llm2ner
+from llm2ner import ToMMeR
+tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-pythia-14m_L3_R64")
+# load Backbone llm, optionnally cut the unused layer to save GPU space.
+llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,)
+tommer.to(llm.device)
+text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "
+#fancy interactive output
+outputs = llm2ner.plotting.demo_inference( text, tommer, llm,
+    decoding_strategy="threshold",  # or "greedy" for flat segmentation
+    threshold=0.5, # default 50%
+    show_attn=True,
+)
+```
+<div>
+<span class="tex2jax_ignore"><div class="spans" style="line-height: 2.5; direction: ltr">
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    Large
+    <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;">
+    <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px">
+        PRED
+    </span>
+</span>
+</span>
+<span style="font-weight: bold; display: inline-block; position: relative; height: 77px;">
+    language
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 57px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 57px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;">
+    <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px">
+        PRED
+    </span>
+</span>
+</span>
+<span style="font-weight: bold; display: inline-block; position: relative; height: 77px;">
+    models
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 57px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+</span>
+are awesome . While trained on
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    language
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;">
+    <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px">
+        PRED
+    </span>
+</span>
+</span>
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    modeling
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+</span>
+, they exhibit
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    emergent
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;">
+    <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px">
+        PRED
+    </span>
+</span>
+</span>
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    abilities
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+</span>
+that make them suitable for a wide range of
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    tasks
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;">
+    <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px">
+        PRED
+    </span>
+</span>
+</span>
+, including
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    Named
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;">
+    <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px">
+        PRED
+    </span>
+</span>
+</span>
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    Entity
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+</span>
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    Recognition
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+</span>
+(
+<span style="font-weight: bold; display: inline-block; position: relative; height: 60px;">
+    NER
+<span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;">
+</span>
+<span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;">
+    <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px">
+        PRED
+    </span>
+</span>
+</span>
+) . </div></span>
+</div>
+## Raw inference
+By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities.
+- Inputs:
+  - tokens (batch, seq): tokens to process,
+  - model: LLM to extract representation from.
+- Outputs: (batch, seq, seq) matrix (masked outside valid spans)
+```python
+tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-pythia-14m_L3_R64")
+# load Backbone llm, optionnally cut the unused layer to save GPU space.
+llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,)
+tommer.to(llm.device)
+#### Raw Inference
+text = ["Large language models are awesome"]
+print(f"Input text: {text[0]}")
+#tokenize in shape (1, seq_len)
+tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device)
+# Output raw scores
+output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len)
+print(f"Raw Output shape: {output.shape}")
+#use given decoding strategy to infer entities
+entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy")
+str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
+print(f"Predicted entities: {str_entities}")
+>>> Input text: Large language models are awesome
+>>> Raw Output shape: torch.Size([1, 6, 6])
+>>> Predicted entities: ['Large language models']
+```
+Please visit the [repository](https://github.com/VictorMorand/llm2ner) for more details and a demo notebook.
+## Evaluation Results
+| dataset             |   precision |   recall |     f1 |   n_samples |
+|---------------------|-------------|----------|--------|-------------|
+| MultiNERD           |      0.1042 |   0.9525 | 0.1879 |      154144 |
+| CoNLL 2003          |      0.1351 |   0.698  | 0.2264 |       16493 |
+| CrossNER_politics   |      0.1356 |   0.9448 | 0.2372 |        1389 |
+| CrossNER_AI         |      0.1584 |   0.9233 | 0.2704 |         879 |
+| CrossNER_literature |      0.1512 |   0.8993 | 0.2589 |         916 |
+| CrossNER_science    |      0.1498 |   0.9311 | 0.258  |        1193 |
+| CrossNER_music      |      0.1563 |   0.9235 | 0.2673 |         945 |
+| ncbi                |      0.0731 |   0.8698 | 0.1348 |        3952 |
+| FabNER              |      0.2017 |   0.8154 | 0.3233 |       13681 |
+| WikiNeural          |      0.0981 |   0.9354 | 0.1776 |       92672 |
+| GENIA_NER           |      0.1441 |   0.9277 | 0.2494 |       16563 |
+| ACE 2005            |      0.145  |   0.4226 | 0.2159 |        8230 |
+| Ontonotes           |      0.1274 |   0.7321 | 0.2171 |       42193 |
+| Aggregated          |      0.1137 |   0.8897 | 0.2017 |      353250 |
+| Mean                |      0.1369 |   0.8443 | 0.2326 |      353250 |
+## Citation
+If using this model or the approach, please cite the associated paper:
+```
+@misc{morand2025tommerefficiententity,
+      title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models},
+      author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
+      year={2025},
+      eprint={2510.19410},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2510.19410},
+}
+```
+## License
+Apache-2.0 (see repository for full text).

definition.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"objects": [{"id": 140521606039968, "module": "llm2ner.models.tommer", "type": "ToMMeR", "typename": "llm2ner.models.tommer.ToMMeR", "identifier": "e7eea0918ec72ef43955e283e0392f3881718ca5c110010b8efd0abb8f07c4fa", "fields": {"llm_name": "EleutherAI/pythia-14m", "layer": 3, "rank": 64, "causal_mask": true, "sliding_window": 25, "use_cosine": true, "normalize_scores": ""}}, {"id": 140521599810624, "module": "llm2ner.xpmModel", "type": "xpmTorchHubModule.Loader", "typename": "llm2ner.xpmModel.xpmTorchHubModule.Loader", "identifier": "fc25d7e5cecfc9368e7adb655a4907b8022d972a917cc1f9c9f48428307370a7", "fields": {"model": {"type": "python", "value": 140521606039968}, "parameters": {"type": "path.serialized", "value": "parameters", "is_folder": false}}}], "data": [{"type": "python", "value": 140521606039968}, [{"type": "python", "value": 140521599810624}]]}

parameters ADDED Viewed

Binary file (68.9 kB). View file