Model Card for Zagros-1.0-Quick
Model Details
- Model Name: Zagros-1.0-Quick
- Model Owner: Darsadilab
- Model URL: https://huggingface.co/darsadilab/zagros-1.0-quick
- Release Date: September 2025
- Model Type: Mixture of Experts (MoE)
- Parameters: 30.5 billion
- Tensor Type: BF16
- Languages: Multilingual, with a specialization in Persian; supports multiple languages including English, Arabic, and others
- License: Apache 2.0
- Version: 1.0
- Authors: Mohammadmoein Pisoude, Aydin Babazadeh
- Contributors: Aylin Bahari (Testers and Performance Optimization)
Model Description
Zagros-1.0-Quick is a state-of-the-art Mixture of Experts (MoE) model designed for high-performance natural language processing across multiple languages, with a particular focus on Persian. Built using world-standard methods, the model leverages a 30.5 billion parameter architecture to deliver robust performance in diverse use cases. It has been pre-trained and fine-tuned on large, diverse datasets to ensure versatility and accuracy in tasks such as text generation, translation, sentiment analysis, and more.
Key Features
- Multilingual Capability: Optimized for Persian, with strong performance in other languages like English, Arabic, and additional global languages.
- Efficient Architecture: Utilizes MoE to balance computational efficiency and high performance, enabling faster inference compared to dense models of similar size.
- Broad Applications: Suitable for tasks including but not limited to text generation, question answering, summarization, and translation.
- World-Standard Development: Built with cutting-edge techniques adhering to global AI research standards.
Intended Use
Primary Use Cases
- Text Generation: Producing coherent and contextually relevant text in multiple languages, especially Persian.
- Translation: High-quality translation, particularly for Persian to/from other languages.
- Sentiment Analysis: Understanding and analyzing sentiment in multilingual contexts.
- Question Answering: Providing accurate and context-aware responses in various domains.
Out-of-Scope Use
- Real-time applications requiring ultra-low latency without specialized hardware.
- Tasks requiring factual correctness without additional verification, as the model may generate plausible but incorrect information.
- Use in safety-critical systems without thorough validation and risk assessment.
Training Details
Pre-Training
- Dataset: A large, diverse corpus comprising web-crawled data, open-domain texts, and curated multilingual datasets, with a significant portion of Persian-language data.
- Methodology: Pre-trained using a Mixture of Experts architecture to optimize for efficiency and performance. Training involved unsupervised learning on massive text corpora to capture linguistic patterns and knowledge.
- Compute Resources: Trained on a cluster of high-performance GPUs over several weeks, leveraging distributed training techniques.
Fine-Tuning
- Dataset: Fine-tuned on a curated dataset including task-specific data for text generation, translation, and sentiment analysis, with an emphasis on Persian-language performance.
- Methodology: Supervised fine-tuning and reinforcement learning from human feedback (RLHF) to align the model with user expectations and improve task-specific performance.
- Data Sources: Includes publicly available datasets, proprietary Persian-language corpora, and synthetic data generated for robustness.
Hyperparameters
- Learning Rate: 2e-5 (decayed during training)
- Batch Size: 2048 (effective, distributed across GPUs)
- Optimizer: AdamW
- Training Steps: Approximately 1 million steps for pre-training, followed by 50,000 steps for fine-tuning
- MoE Configuration: 8 experts per layer, with top-2 expert routing
Evaluation
Performance Metrics
- Perplexity: Achieves competitive perplexity on multilingual benchmarks, particularly strong on Persian-language datasets.
- Task-Specific Metrics:
- Translation (BLEU): 35.2 on Persian-English WMT dataset.
- Text Generation (ROUGE): ROUGE-L of 0.68 on Persian summarization tasks.
- Sentiment Analysis (F1): 0.89 F1-score on Persian sentiment datasets.
- Multilingual Benchmarks: Evaluated on XGLUE and XTREME, showing strong cross-lingual transfer capabilities.
Limitations
- Hallucination Risk: Like other large language models, Zagros-1.0-Quick may generate plausible but factually incorrect outputs.
- Language Bias: While optimized for Persian, performance on low-resource languages may be less robust.
- Resource Requirements: Requires significant computational resources for inference, though optimized for efficiency via MoE.
Ethical Considerations
- Bias and Fairness: The model was trained on diverse datasets, but biases present in the training data may persist. Users should evaluate outputs for unintended biases, particularly in sensitive applications.
- Environmental Impact: Training large models like Zagros-1.0-Quick consumes significant energy. Efforts were made to optimize compute efficiency, but users should consider environmental costs for large-scale deployment.
- Responsible Use: Users are encouraged to verify outputs for accuracy and appropriateness, especially in contexts involving legal, medical, or financial decisions.
Usage Instructions
Installation
To use Zagros-1.0-Quick with the specific version of the Transformers library from ZagrosLLMModel, install it using:
pip install git+https://github.com/ZagrosLLMModel/transformers.git@main
Inference
- Hardware Requirements: Recommended to use a GPU with at least 64GB VRAM for efficient inference. CPU inference is possible but slower.
- Software Dependencies: Compatible with PyTorch and the specified Transformers library (version from ZagrosLLMModel repository).
- Example Code:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "darsadilab/zagros-1.0-quick"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "یک وبسایت حرفه ای با استفاده از html طراحی کن که تک کد باشد و شامل css/js داخل همین html باشد."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
Deployment
- Available for download via Hugging Face Hub.
- Currently not deployed by any inference provider. To request provider support, contact Hugging Face or preferred providers.
Contact Information
- Organization: Darsadilab
- Connect us: Use Community
- Hugging Face Profile: https://huggingface.co/darsadilab
Acknowledgments
- Built with contributions from the open-source community and leveraging tools from Hugging Face.
- Special thanks to the Persian NLP community for providing valuable datasets and feedback.
Citation
If you use Zagros-1.0-Quick in your research or application, please cite:
@misc{darsadilab2025zagros,
title={Zagros-1.0-Quick: A Multilingual MoE Model with Persian Specialization},
author={Mohammadmoein Pisoude and Aydin Babazadeh and Aylin Bahari},
year={2025},
url={https://huggingface.co/darsadilab/zagros-1.0-quick}
}
- Downloads last month
- 23
Model tree for darsadilab/zagros-1.0-quick
Base model
Qwen/Qwen3-30B-A3B-Instruct-2507