YASIN-Persian-Base

YASIN-Persian-Base is a state-of-the-art, large language model (LLM) specifically architected and trained from scratch for the Persian language. Unlike models that are simply fine-tuned versions of English models, YASIN has been developed with a deep focus on the morphological and syntactical nuances of Persian (Farsi) from day one.

Model Overview

  • Developer: ysn-rfd
  • Model Type: Decoder-only Transformer
  • Language(s): Persian (Primary)
  • License: LICENSE
  • Training Status: Trained from scratch (Original Architecture)
  • Hugging Face Link: ysn-rfd/YASIN-Persian-Base

Technical Architecture

YASIN utilizes a modern Transformer architecture optimized for stability and inference efficiency. Its design is inspired by high-performance models like Llama but customized for the Persian script.

Architectural Highlights

  1. RoPE (Rotary Positional Embeddings): Implements dynamic rotary embeddings for better long-range dependency handling and positional awareness within the context window.
  2. RMSNorm (Root Mean Square Layer Normalization): Used at the start of each transformer block to improve training stability and prevent gradient explosion.
  3. SwiGLU (SiLU) Activation: Employs the SiLU activation function within the MLP (Gated Linear Unit) layers, which has been shown to outperform standard ReLU in LLMs.
  4. Byte-level BPE Tokenizer: A specialized tokenizer designed to handle the zero-width non-joiner (ZWNJ) and unique Persian characters without excessive token fragmentation.

Model Dimensions (Base Version)

Parameter Value Description
Layers 16 Number of Transformer blocks
Hidden Size 768 Model representation dimension
Attention Heads 16 Number of multi-head attention units
Intermediate Size 3072 MLP expansion layer size
Context Length 512 - 2048 Supported sequence length (Tokens)
Vocab Size 50257 Optimized for Persian efficiency

Training & Dataset

YASIN was trained on a massive, high-quality corpus specifically curated for Persian instruction-following and conversation.

  • Dataset Name: BIG_Persian_TEXT_QA_Conversations_DATASET
  • Format: ShareGPT (Multi-turn conversations)
  • Content: Includes Persian literature, news, QA pairs, technical documentation, and colloquial dialogues.
  • Optimization: Trained using the AdamW optimizer with a linear learning rate scheduler and mixed-precision (FP16/BF16) for hardware acceleration.

Professional Usage (Inference)

Environment Setup

pip install torch transformers sentencepiece

Loading the Model

Since YASIN uses a custom architecture, ensure you have the modeling_yasin.py and configuration_yasin.py files in your directory or use trust_remote_code=True.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "ysn-rfd/YASIN-Persian-Base"

# Load Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto",
    trust_remote_code=True
)

# Instruction Prompt Format
prompt = "توضیح بده هوش مصنوعی چگونه کار می‌کند؟"
input_text = f"### سوال: {prompt}\n### پاسخ:"

# Tokenize and Generate
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(
    **inputs, 
    max_new_tokens=256, 
    temperature=0.7, 
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Key Strengths

  • Persian First: Handles complex Persian grammar, formal vs. informal registers, and right-to-left (RTL) formatting flawlessly.
  • Instruction Following: Fine-tuned to understand and execute commands (Question Answering, Summarization, and Creative Writing).
  • Efficient Inference: The "Base" architecture is lightweight enough to be deployed on consumer GPUs (e.g., RTX 3060/4060) while maintaining high-quality output.

Ethical Considerations & Limitations

  • Knowledge Cutoff: The model's knowledge is limited to its training data timeframe.
  • Hallucinations: Like all LLMs, YASIN may occasionally generate factually incorrect information. Users should verify critical data.
  • Safety: While efforts were made to filter toxic content, the model should be used with a safety layer in production environments.

Citation & Contact

If you use this model in your research or commercial applications, please cite the repository:

Developer: Yasin َAryanfard (ysn-rfd) Project: YASIN Large Language Model Project Contact: Hugging Face Profile


Document Version: 1.2.1 | Last Updated: 12/24/2025

Downloads last month
39
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support