Improve model card: Add fill-mask pipeline tag, license, language, and domain tags

This PR improves the model card by:

* Adding the `license: apache-2.0` metadata.
* Specifying the `pipeline_tag: fill-mask`, enabling better discoverability at https://huggingface.co/models?pipeline_tag=fill-mask.
* Including relevant `language: ja` and additional `tags` such as `japanese`, `pharmaceutical`, `bert`, and `continual-pretraining`.
* Adding a direct link to the paper and the GitHub repository at the top of the model card for better visibility.

Files changed (1) hide show

README.md +24 -17

README.md CHANGED Viewed

@@ -1,16 +1,23 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card
-<!-- Provide a quick summary of what the model is/does. -->
-Our **JpharmaBERT (base)** is a continually pre-trained version of the BERT model ([tohoku-nlp/bert-base-japanese-v3](https://huggingface.co/tohoku-nlp/bert-base-japanese-v3)), further trained on pharmaceutical data — the same dataset used for [eques/jpharmatron](https://huggingface.co/EQUES/JPharmatron-7B).
-# Examoke Usage
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ```python
 import torch
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
@@ -34,25 +41,25 @@ for result in results:
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 We used the same dataset as [eques/jpharmatron](https://huggingface.co/EQUES/JPharmatron-7B) for training our JpharmaBERT, which consists of:
-- Japanese text data (2B tokens) collected from pharmaceutical documents such as academic papers and package inserts
-- English data (8B tokens) obtained from PubMed abstracts
-- Pharmaceutical-related data (1.2B tokens) extracted from the multilingual CC100 dataset
-After removing duplicate entries across these sources, the final dataset contains approximately 9 billion tokens.
 (For details, please refer to our paper about Jpharmatron: [link](https://arxiv.org/abs/2505.16661))
 #### Training Hyperparameters
 The model was continually pre-trained with the following settings:
-- Mask probability: 15%
-- Maximum sequence length: 512 tokens
-- Number of training epochs: 6
-- Learning rate: 1e-4
-- Warm-up steps: 10,000
-- Per-device training batch size: 64
 ## Model Card Authors

 ---
 library_name: transformers
+license: apache-2.0
+language: ja
+pipeline_tag: fill-mask
+tags:
+  - japanese
+  - pharmaceutical
+  - bert
+  - continual-pretraining
 ---
+# JpharmaBERT: A Japanese Language Model for Pharmaceutical NLP
+[\ud83d\udcda Paper](https://huggingface.co/papers/2505.16661) - [\ud83d\udcbb Code](https://github.com/EQUES-AI/JpharmaBERT)
+This is the **JpharmaBERT (base)** model, presented in the paper [A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP](https://huggingface.co/papers/2505.16661). It is a continually pre-trained version of the BERT model ([tohoku-nlp/bert-base-japanese-v3](https://huggingface.co/tohoku-nlp/bert-base-japanese-v3)), further trained on pharmaceutical data.
+# Example Usage
 ```python
 import torch
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
 ### Training Data
 We used the same dataset as [eques/jpharmatron](https://huggingface.co/EQUES/JPharmatron-7B) for training our JpharmaBERT, which consists of:
+*   Japanese text data (2B tokens) collected from pharmaceutical documents such as academic papers and package inserts
+*   English data (8B tokens) obtained from PubMed abstracts
+*   Pharmaceutical-related data (1.2B tokens) extracted from the multilingual CC100 dataset
+After removing duplicate entries across these sources, the final dataset contains approximately 9 billion tokens.
 (For details, please refer to our paper about Jpharmatron: [link](https://arxiv.org/abs/2505.16661))
 #### Training Hyperparameters
 The model was continually pre-trained with the following settings:
+*   Mask probability: 15%
+*   Maximum sequence length: 512 tokens
+*   Number of training epochs: 6
+*   Learning rate: 1e-4
+*   Warm-up steps: 10,000
+*   Per-device training batch size: 64
 ## Model Card Authors