YASIN-Persian-Base

YASIN-Persian-Base is a state-of-the-art, large language model (LLM) specifically architected and trained from scratch for the Persian language. Unlike models that are simply fine-tuned versions of English models, YASIN has been developed with a deep focus on the morphological and syntactical nuances of Persian (Farsi) from day one.

Model Overview

Developer: ysn-rfd
Model Type: Decoder-only Transformer
Language(s): Persian (Primary)
License: LICENSE
Training Status: Trained from scratch (Original Architecture)
Hugging Face Link: ysn-rfd/YASIN-Persian-Base

Technical Architecture

YASIN utilizes a modern Transformer architecture optimized for stability and inference efficiency. Its design is inspired by high-performance models like Llama but customized for the Persian script.

Architectural Highlights

RoPE (Rotary Positional Embeddings): Implements dynamic rotary embeddings for better long-range dependency handling and positional awareness within the context window.
RMSNorm (Root Mean Square Layer Normalization): Used at the start of each transformer block to improve training stability and prevent gradient explosion.
SwiGLU (SiLU) Activation: Employs the SiLU activation function within the MLP (Gated Linear Unit) layers, which has been shown to outperform standard ReLU in LLMs.
Byte-level BPE Tokenizer: A specialized tokenizer designed to handle the zero-width non-joiner (ZWNJ) and unique Persian characters without excessive token fragmentation.

Model Dimensions (Base Version)

Parameter	Value	Description
Layers	16	Number of Transformer blocks
Hidden Size	768	Model representation dimension
Attention Heads	16	Number of multi-head attention units
Intermediate Size	3072	MLP expansion layer size
Context Length	512 - 2048	Supported sequence length (Tokens)
Vocab Size	50257	Optimized for Persian efficiency

Training & Dataset

YASIN was trained on a massive, high-quality corpus specifically curated for Persian instruction-following and conversation.

Dataset Name: BIG_Persian_TEXT_QA_Conversations_DATASET
Format: ShareGPT (Multi-turn conversations)
Content: Includes Persian literature, news, QA pairs, technical documentation, and colloquial dialogues.
Optimization: Trained using the AdamW optimizer with a linear learning rate scheduler and mixed-precision (FP16/BF16) for hardware acceleration.

Professional Usage (Inference)

Environment Setup

pip install torch transformers sentencepiece

Loading the Model

Since YASIN uses a custom architecture, ensure you have the modeling_yasin.py and configuration_yasin.py files in your directory or use trust_remote_code=True.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "ysn-rfd/YASIN-Persian-Base"

# Load Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto",
    trust_remote_code=True
)

# Instruction Prompt Format
prompt = "توضیح بده هوش مصنوعی چگونه کار می‌کند؟"
input_text = f"### سوال: {prompt}\n### پاسخ:"

# Tokenize and Generate
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs, 
    max_new_tokens=256, 
    temperature=0.7, 
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Key Strengths

Persian First: Handles complex Persian grammar, formal vs. informal registers, and right-to-left (RTL) formatting flawlessly.
Instruction Following: Fine-tuned to understand and execute commands (Question Answering, Summarization, and Creative Writing).
Efficient Inference: The "Base" architecture is lightweight enough to be deployed on consumer GPUs (e.g., RTX 3060/4060) while maintaining high-quality output.

Ethical Considerations & Limitations

Knowledge Cutoff: The model's knowledge is limited to its training data timeframe.
Hallucinations: Like all LLMs, YASIN may occasionally generate factually incorrect information. Users should verify critical data.
Safety: While efforts were made to filter toxic content, the model should be used with a safety layer in production environments.

Citation & Contact

If you use this model in your research or commercial applications, please cite the repository:

Developer: Yasin َAryanfard (ysn-rfd) Project: YASIN Large Language Model Project Contact: Hugging Face Profile

Document Version: 1.2.1 | Last Updated: 12/24/2025

Downloads last month: 39

Safetensors

Model size

0.2B params

Tensor type

F32