YASIN-Persian-Base
YASIN-Persian-Base is a state-of-the-art, large language model (LLM) specifically architected and trained from scratch for the Persian language. Unlike models that are simply fine-tuned versions of English models, YASIN has been developed with a deep focus on the morphological and syntactical nuances of Persian (Farsi) from day one.
Model Overview
- Developer: ysn-rfd
- Model Type: Decoder-only Transformer
- Language(s): Persian (Primary)
- License: LICENSE
- Training Status: Trained from scratch (Original Architecture)
- Hugging Face Link: ysn-rfd/YASIN-Persian-Base
Technical Architecture
YASIN utilizes a modern Transformer architecture optimized for stability and inference efficiency. Its design is inspired by high-performance models like Llama but customized for the Persian script.
Architectural Highlights
- RoPE (Rotary Positional Embeddings): Implements dynamic rotary embeddings for better long-range dependency handling and positional awareness within the context window.
- RMSNorm (Root Mean Square Layer Normalization): Used at the start of each transformer block to improve training stability and prevent gradient explosion.
- SwiGLU (SiLU) Activation: Employs the SiLU activation function within the MLP (Gated Linear Unit) layers, which has been shown to outperform standard ReLU in LLMs.
- Byte-level BPE Tokenizer: A specialized tokenizer designed to handle the zero-width non-joiner (ZWNJ) and unique Persian characters without excessive token fragmentation.
Model Dimensions (Base Version)
| Parameter | Value | Description |
|---|---|---|
| Layers | 16 | Number of Transformer blocks |
| Hidden Size | 768 | Model representation dimension |
| Attention Heads | 16 | Number of multi-head attention units |
| Intermediate Size | 3072 | MLP expansion layer size |
| Context Length | 512 - 2048 | Supported sequence length (Tokens) |
| Vocab Size | 50257 | Optimized for Persian efficiency |
Training & Dataset
YASIN was trained on a massive, high-quality corpus specifically curated for Persian instruction-following and conversation.
- Dataset Name:
BIG_Persian_TEXT_QA_Conversations_DATASET - Format: ShareGPT (Multi-turn conversations)
- Content: Includes Persian literature, news, QA pairs, technical documentation, and colloquial dialogues.
- Optimization: Trained using the AdamW optimizer with a linear learning rate scheduler and mixed-precision (FP16/BF16) for hardware acceleration.
Professional Usage (Inference)
Environment Setup
pip install torch transformers sentencepiece
Loading the Model
Since YASIN uses a custom architecture, ensure you have the modeling_yasin.py and configuration_yasin.py files in your directory or use trust_remote_code=True.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "ysn-rfd/YASIN-Persian-Base"
# Load Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Instruction Prompt Format
prompt = "توضیح بده هوش مصنوعی چگونه کار میکند؟"
input_text = f"### سوال: {prompt}\n### پاسخ:"
# Tokenize and Generate
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Key Strengths
- Persian First: Handles complex Persian grammar, formal vs. informal registers, and right-to-left (RTL) formatting flawlessly.
- Instruction Following: Fine-tuned to understand and execute commands (Question Answering, Summarization, and Creative Writing).
- Efficient Inference: The "Base" architecture is lightweight enough to be deployed on consumer GPUs (e.g., RTX 3060/4060) while maintaining high-quality output.
Ethical Considerations & Limitations
- Knowledge Cutoff: The model's knowledge is limited to its training data timeframe.
- Hallucinations: Like all LLMs, YASIN may occasionally generate factually incorrect information. Users should verify critical data.
- Safety: While efforts were made to filter toxic content, the model should be used with a safety layer in production environments.
Citation & Contact
If you use this model in your research or commercial applications, please cite the repository:
Developer: Yasin َAryanfard (ysn-rfd) Project: YASIN Large Language Model Project Contact: Hugging Face Profile
Document Version: 1.2.1 | Last Updated: 12/24/2025
- Downloads last month
- 39