Initialize

Browse files

Files changed (8) hide show

README.md +90 -0
config.json +70 -0
pytorch_model.bin +3 -0
special_tokens_map.json +7 -0
tf_model.h5 +3 -0
tokenizer.json +0 -0
tokenizer_config.json +13 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,90 @@

+---
+language: fa
+---
+# BertNER
+This model fine-tuned for the Named Entity Recognition (NER) task on a mixed NER dataset collected from [ARMAN](https://github.com/HaniehP/PersianNER), [PEYMA](http://nsurl.org/2019-2/tasks/task-7-named-entity-recognition-ner-for-farsi/), and [WikiANN](https://elisa-ie.github.io/wikiann/) that covered ten types of entities:
+- Date (DAT)
+- Event (EVE)
+- Facility (FAC)
+- Location (LOC)
+- Money (MON)
+- Organization (ORG)
+- Percent (PCT)
+- Person (PER)
+- Product (PRO)
+- Time (TIM)
+## Dataset Information
+|       |   Records |   B-DAT |   B-EVE |   B-FAC |   B-LOC |   B-MON |   B-ORG |   B-PCT |   B-PER |   B-PRO |   B-TIM |   I-DAT |   I-EVE |   I-FAC |   I-LOC |   I-MON |   I-ORG |   I-PCT |   I-PER |   I-PRO |   I-TIM |
+|:------|----------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|
+| Train |     29133 |    1423 |    1487 |    1400 |   13919 |     417 |   15926 |     355 |   12347 |    1855 |     150 |    1947 |    5018 |    2421 |    4118 |    1059 |   19579 |     573 |    7699 |    1914 |     332 |
+| Valid |      5142 |     267 |     253 |     250 |    2362 |     100 |    2651 |      64 |    2173 |     317 |      19 |     373 |     799 |     387 |     717 |     270 |    3260 |     101 |    1382 |     303 |      35 |
+| Test  |      6049 |     407 |     256 |     248 |    2886 |      98 |    3216 |      94 |    2646 |     318 |      43 |     568 |     888 |     408 |     858 |     263 |    3967 |     141 |    1707 |     296 |      78 |
+## Evaluation
+The following tables summarize the scores obtained by model overall and per each class.
+**Overall**
+|    Model   | accuracy | precision |  recall  |    f1    |
+|:----------:|:--------:|:---------:|:--------:|:--------:|
+|    Bert    | 0.995086 |  0.953454 | 0.961113 | 0.957268 |
+**Per entities**
+|     	| number 	| precision 	|  recall  	|    f1    	|
+|:---:	|:------:	|:---------:	|:--------:	|:--------:	|
+| DAT 	|   407  	|  0.860636 	| 0.864865 	| 0.862745 	|
+| EVE 	|   256  	|  0.969582 	| 0.996094 	| 0.982659 	|
+| FAC 	|   248  	|  0.976190 	| 0.991935 	| 0.984000 	|
+| LOC 	|  2884  	|  0.970232 	| 0.971914 	| 0.971072 	|
+| MON 	|   98   	|  0.905263 	| 0.877551 	| 0.891192 	|
+| ORG 	|  3216  	|  0.939125 	| 0.954602 	| 0.946800 	|
+| PCT 	|   94   	|  1.000000 	| 0.968085 	| 0.983784 	|
+| PER 	|  2645  	|  0.965244 	| 0.965974 	| 0.965608 	|
+| PRO 	|   318  	|  0.981481 	| 1.000000 	| 0.990654 	|
+| TIM 	|   43   	|  0.692308 	| 0.837209 	| 0.757895 	|
+## How To Use
+You use this model with Transformers pipeline for NER.
+### Installing requirements
+```bash
+pip install transformers
+```
+### How to predict using pipeline
+```python
+from transformers import AutoTokenizer
+from transformers import AutoModelForTokenClassification  # for pytorch
+from transformers import TFAutoModelForTokenClassification  # for tensorflow
+from transformers import pipeline
+model_name_or_path = "HooshvareLab/bert-fa-zwnj-base-ner"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
+# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow
+nlp = pipeline("ner", model=model, tokenizer=tokenizer)
+example = "در سال ۲۰۱۳ درگذشت و آندرتیکر و کین برای او مراسم یادبود گرفتند."
+ner_results = nlp(example)
+print(ner_results)
+```
+## Questions?
+Post a Github issue on the [ParsNER Issues](https://github.com/hooshvare/parsner/issues) repo.

config.json ADDED Viewed

	@@ -0,0 +1,70 @@

+{
+  "architectures": [
+    "BertForTokenClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "finetuning_task": "ner",
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "O",
+    "1": "B-DAT",
+    "2": "B-EVE",
+    "3": "B-FAC",
+    "4": "B-LOC",
+    "5": "B-MON",
+    "6": "B-ORG",
+    "7": "B-PCT",
+    "8": "B-PER",
+    "9": "B-PRO",
+    "10": "B-TIM",
+    "11": "I-DAT",
+    "12": "I-EVE",
+    "13": "I-FAC",
+    "14": "I-LOC",
+    "15": "I-MON",
+    "16": "I-ORG",
+    "17": "I-PCT",
+    "18": "I-PER",
+    "19": "I-PRO",
+    "20": "I-TIM"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "B-DAT": 1,
+    "B-EVE": 2,
+    "B-FAC": 3,
+    "B-LOC": 4,
+    "B-MON": 5,
+    "B-ORG": 6,
+    "B-PCT": 7,
+    "B-PER": 8,
+    "B-PRO": 9,
+    "B-TIM": 10,
+    "I-DAT": 11,
+    "I-EVE": 12,
+    "I-FAC": 13,
+    "I-LOC": 14,
+    "I-MON": 15,
+    "I-ORG": 16,
+    "I-PCT": 17,
+    "I-PER": 18,
+    "I-PRO": 19,
+    "I-TIM": 20,
+    "O": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.5.0.dev0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 42000
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0060567e2193d40844f08ffa1b5e73bdfa3e74257aaccc616ffcb1e5442d323c
+size 470980151

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+    "unk_token": "[UNK]",
+    "sep_token": "[SEP]",
+    "pad_token": "[PAD]",
+    "cls_token": "[CLS]",
+    "mask_token": "[MASK]"
+}

tf_model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9e8f8d228fef2eec9702a355c35daa90c0d7d2b8eef00439c92bbef29d2e13e
+size 471159904

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+    "do_lower_case": false,
+    "unk_token": "[UNK]",
+    "sep_token": "[SEP]",
+    "pad_token": "[PAD]",
+    "cls_token": "[CLS]",
+    "mask_token": "[MASK]",
+    "tokenize_chinese_chars": true,
+    "strip_accents": false,
+    "model_max_length": 512,
+    "special_tokens_map_file": null,
+    "name_or_path": "HooshvareLab/bert-fa-zwnj-base"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff