---
base_model:
- meta-llama/Llama-3.1-8B-Instruct
language:
- en
tags:
- problem-solving
- chain-of-thought
- reasoning
license: llama3.1
datasets:
- agentlans/train-of-thought
---

# Llama3.1-deep-o1

- A robust merge of DeepSeek R1 distilled and O1-style long chain-of-thought (CoT) large language models (LLMs).
- It can generate long, coherent solutions and excels at problem-solving tasks among models with 8 billion parameters.

## Model Overview

- Supports long reasoning and non-reasoning modes.
- In reasoning mode, generates the thought process for problem solving.
- Suitable for creating solution outlines and analyzing problems
- Or as a foundation model for finetuning and merging.

> [!NOTE]  
> To use the long CoT reasoning mode, use a system prompt like
> 
> `Explain your reasoning step-by-step using <think>...</think>, then give the final answer inside <response>...</response>.`

### Examples to Try:
- Write the equations for glycolysis and pyruvate oxidation.
- Calculate net ATP formation from glucose metabolism (excluding electron transport chain).
- Integrate x^2 e^x dx.
- Prove that the complete bipartite graph K_{3,3} isn't planar.
- Derive a formula for the critical angle between two media with refractive indices n_1 and n_2.
- Compare steam vs. diesel engines including their capabilities and historical significance.
  
## Limitations

> [!WARNING]  
> This model is experimental.
> While the model provides coherent and expert-like responses, users should verify its outputs for accuracy - especially in calculations or logical reasoning tasks.

- It is not optimized for conversational tasks but performs well in single-turn question answering.
- Inconsistent formatting for mathematical equations and LaTeX code.
- May have inaccurate data, make calculation errors or reasoning mistakes.
- Struggles with multiturn conversations and user alignment.


## Model Details

The model was created using the following Mergekit YAML configuration:

```yaml
models:
  - model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  - model: Skywork/Skywork-o1-Open-Llama-3.1-8B
  - model: SimpleBerry/LLaMA-O1-Supervised-1129
  - model: NousResearch/DeepHermes-3-Llama-3-8B-Preview
  - model: O1-OPEN/OpenO1-LLama-8B-v0.1
  - model: nvidia/Llama-3.1-Nemotron-Nano-8B-v1
merge_method: karcher
tokenizer:
  source: meta-llama/Llama-3.1-8B-Instruct
dtype: bfloat16
```

Then the merged model was trained on [agentlans/train-of-thought](https://huggingface.co/datasets/agentlans/train-of-thought) `10K` subset for 1 epoch using LLaMA Factory.
- LoRA rank 8, alpha 16, dropout 0.5, use rsLoRA
- Pack sequences, NEFTune 5
- Liger kernel accelerator

## Licence

Llama 3.1 license