m3rg-iitd
/

llamat-3-chat

@@ -14,49 +14,97 @@ tags:
 - table understanding
 - table data parsing
 ---
 # Model Card for LLaMat-3-Chat
-**LLaMat-3-Chat** is a specialized large language model designed to serve as a copilot for materials research. Finetuned from **LLaMat-3**, this model is adapted for tasks such as information extraction from material science text and tabular data.
----
-## Overview
 - **Model Type:** Large Language Model (LLM)
 - **Base Model:** LLaMat-3 (continued pretraining of LLaMA-3 on material science data)
 - **Language:** English
 - **License:** LLaMA-3 License
 - **Tags:** Material Science, Domain Adaptation, Table Understanding, Scientific Data Parsing, Materials Copilot
 ---
-## Model Details
-### Key Features
-- **Instruction Following Abilities:** Optimized for understanding and processing instructions in the material science domain.
-- **Domain-Specific Expertise:** Pretrained on material science tokens, enabling high performance in scientific applications.
-- **Applications:** information extraction, table understanding, and parsing data for research tasks.
-### Development and Support
-- **Developed by:** [M3RG, IIT Delhi](https://github.com/M3RG-IITD/) & [DAIR, IIT Delhi](https://github.com/dair-iitd)
-- **Compute Support:**
-  - **Edinburgh International Data Facility (EIDF):** Provided access to Cerebras CS2 clusters for pretraining.
-  - **IIT Delhi High-Performance Computing Cluster:** Supported fine-tuning and inference stages.
 ---
 ## Technical Specifications
 ### Hardware Infrastructure
 - **Pretraining:** 2 Cerebras CS-2 Wafer-Scale Engines (WSE-2)
 - **Finetuning:** 8 NVIDIA A100 80GB GPUs
 - **Inferencing:** 1 NVIDIA A100 80GB GPU
 ### Software Stack
 - **Frameworks:** PyTorch, Hugging Face Transformers
 ---
-## Model Sources
 - **Repository:** [LLaMat-3 on GitHub](https://github.com/M3RG-IITD/llamat)
-- **Compute Resources:** [EIDF Cerebras CS Clusters](https://edinburgh-international-data-facility.ed.ac.uk/services/computing/cerebras-cs)

 - table understanding
 - table data parsing
 ---
 # Model Card for LLaMat-3-Chat
+## Overview
+**LLaMat-3-Chat** is a specialized large language model designed to serve as an AI copilot for materials research. Finetuned from **LLaMat-3**, this model is adapted for tasks such as information extraction from material science text and tabular data. It provides advanced capabilities in scientific data processing, assisting researchers in analyzing and interpreting material science literature, reports, and datasets.
+For more details, refer to our paper: [Foundational Large Language Models for Materials Research](https://arxiv.org/abs/2412.09560).
+### Model Details
 - **Model Type:** Large Language Model (LLM)
 - **Base Model:** LLaMat-3 (continued pretraining of LLaMA-3 on material science data)
 - **Language:** English
 - **License:** LLaMA-3 License
 - **Tags:** Material Science, Domain Adaptation, Table Understanding, Scientific Data Parsing, Materials Copilot
+- **Developed by:** [M3RG, IIT Delhi](https://github.com/M3RG-IITD/) & [DAIR, IIT Delhi](https://github.com/dair-iitd)
 ---
+## Intended Use
+LLaMat-3-Chat is designed to assist researchers, scientists, and industry professionals in:
+- Extracting structured information from material science texts and tables.
+- Analyzing experimental results and processing large datasets.
+- Assisting in literature review and knowledge discovery.
+- Supporting research-driven natural language queries related to material science.
+This model is intended for academic and industrial research purposes.
 ---
 ## Technical Specifications
 ### Hardware Infrastructure
 - **Pretraining:** 2 Cerebras CS-2 Wafer-Scale Engines (WSE-2)
 - **Finetuning:** 8 NVIDIA A100 80GB GPUs
 - **Inferencing:** 1 NVIDIA A100 80GB GPU
 ### Software Stack
 - **Frameworks:** PyTorch, Hugging Face Transformers
 ---
+## Training Data
+LLaMat-3-Chat was trained on a curated corpus of material science literature, scientific papers, structured datasets, and technical reports. The training set includes:
+- material science research papers published in journals of Elsevier and Springer.
+- Material science community discourse
+- Redpajama dataset
+- Openorca instruction finetuning dataset
+- mathQA dataset
+- MatSciNLP benchmark dataset
+- task specific datasets (mentioned in Table A.2 in [Foundational Large Language Models for Materials Research](https://arxiv.org/abs/2412.09560).)
+---
+## Results
+detailed results and comparison with existing models can be read from  [Foundational Large Language Models for Materials Research](https://arxiv.org/abs/2412.09560).
+---
+### Development and Support
+- **Developed by:** [M3RG, IIT Delhi](https://github.com/M3RG-IITD/) & [DAIR, IIT Delhi](https://github.com/dair-iitd)
+- **Compute Support:**
+  - **IIT Delhi High-Performance Computing Cluster:** Supported fine-tuning and inference stages.
+  - **Edinburgh International Data Facility (EIDF):** [EIDF Cerebras CS Clusters](https://edinburgh-international-data-facility.ed.ac.uk/services/computing/cerebras-cs) provided access to Cerebras CS2 clusters for pretraining.
+---
+## Repository with training and evaluation code
 - **Repository:** [LLaMat-3 on GitHub](https://github.com/M3RG-IITD/llamat)
+---
+## Citation
+If you use LLaMat-3-Chat in your research, please cite our work:
+```
+@article{LLaMat-3,
+  author    = {Vaibhav Mishra and Somaditya Singh and Dhruv Ahlawat and Mohd Zaki and Vaibhav Bihani and Hargun Singh Grover and Biswajit Mishra and Santiago Miret and Mausam and N. M. Anoop Krishnan},
+  title     = {Foundational Large Language Models for Materials Research},
+  journal   = {arXiv preprint arXiv:2412.09560},
+  year      = {2024},
+  url       = {https://arxiv.org/abs/2412.09560}
+}
+```