---
library_name: peft
license: apache-2.0
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
- meta-llama/Llama-4-Scout-17B-16E-Instruct
tags:
- llama-factory
- lora
- generated_from_trainer
- summarization
- news-classification
- feature
- extraction
- feature-extraction
- NER
- Named
- Entity
- named-entity-recognition
- Qwen2.5
- Qwen1.5B
- knowledge-distillation
- Llama-4-Scout
- LLama
- Llama4
- Teacher-model
- Student-model
model-index:
- name: models
  results: []
pipeline_tag: feature-extraction
language:
- ar
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Results 
<img src="./results.png" alt="Sample Invoice" width="auto"/>

# models

This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) on the News dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2032

## Model description

The primary objective of the Qwen2.5-1.5B-Instruct model, which has been fine-tuned, is to automatically extract and summarize critical information from Arabic text inputs, such as news articles, generating structured JSON-like outputs.


## Training and evaluation data

Fine-Tuning Dataset: 2001 samples of Arabic technology-related text, used to adapt the model for structured extraction tasks.


Evaluation Dataset: 100 samples of Arabic sports-related text, used to assess performance on a different domain.

# Average similarity scores on the evaluation data
- This similarity measure is applied only to one field from the output JSON, namely the "News_title," and the data belongs to a different domain than the one the model was fine-tuned on.
- Mean similarity: 0.6871507857739926
- Note: The average similarity score is quite good, considering that the Qwen model was fine-tuned on a tech dataset. However, since all the test data here is related to sports, testing on data more similar to the tech domain would likely yield even better accuracy.


## Training procedure
Since the dataset was not labeled, Llama 4 Scout was employed as a teacher model in a knowledge distillation framework to generate pseudo-labels or guide the training of Qwen2.5-1.5B-Instruct (the student model). Knowledge distillation transfers knowledge from a larger, more capable model (Llama 4 Scout) to a smaller, efficient model (Qwen2.5-1.5B-Instruct).
- Role of Llama 4 Scout:
Teacher Model: Llama 4 Scout, a powerful language model, was used to process the unlabeled 2001 technology samples and generate high-quality structured outputs (e.g., pseudo-labels for story titles, keywords, summaries, categories, and entities).
Output Generation: For each input text, Llama 4 Scout produced:
Story Title: A concise headline summarizing the main event.
Keywords: Relevant terms extracted based on contextual understanding.
Summary: A set of key sentences or abstractive summary points.
Category: A predicted category (e.g., “technology” for training data).
Entities: Identified entities with types (e.g., person, organization), using its advanced NER capabilities.

- Tool: LLaMA-Factory used for streamlined fine-tuning, supporting LoRA (Low-Rank Adaptation).
### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.2435        | 0.2061 | 100  | 0.2322          |
| 0.2341        | 0.4122 | 200  | 0.2187          |
| 0.2136        | 0.6182 | 300  | 0.2057          |
| 0.2021        | 0.8243 | 400  | 0.1994          |
| 0.1384        | 1.0309 | 500  | 0.1992          |
| 0.1487        | 1.2370 | 600  | 0.1972          |
| 0.1437        | 1.4431 | 700  | 0.1935          |
| 0.1371        | 1.6491 | 800  | 0.1927          |
| 0.147         | 1.8552 | 900  | 0.1883          |
| 0.0668        | 2.0618 | 1000 | 0.1961          |
| 0.077         | 2.2679 | 1100 | 0.2072          |
| 0.0707        | 2.4740 | 1200 | 0.2032          |
| 0.059         | 2.6801 | 1300 | 0.2037          |
| 0.0657        | 2.8861 | 1400 | 0.2032          |


### Framework versions

- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1