---
base_model:
- meta-llama/Llama-3.1-8B-Instruct
language:
- en
tags:
- problem-solving
- chain-of-thought
- reasoning
license: llama3.1
datasets:
- agentlans/train-of-thought
---
# Llama3.1-deep-o1
- A robust merge of DeepSeek R1 distilled and O1-style long chain-of-thought (CoT) large language models (LLMs).
- It can generate long, coherent solutions and excels at problem-solving tasks among models with 8 billion parameters.
## Model Overview
- Supports long reasoning and non-reasoning modes.
- In reasoning mode, generates the thought process for problem solving.
- Suitable for creating solution outlines and analyzing problems
- Or as a foundation model for finetuning and merging.
> [!NOTE]
> To use the long CoT reasoning mode, use a system prompt like
>
> `Explain your reasoning step-by-step using ..., then give the final answer inside ....`
### Examples to Try:
- Write the equations for glycolysis and pyruvate oxidation.
- Calculate net ATP formation from glucose metabolism (excluding electron transport chain).
- Integrate x^2 e^x dx.
- Prove that the complete bipartite graph K_{3,3} isn't planar.
- Derive a formula for the critical angle between two media with refractive indices n_1 and n_2.
- Compare steam vs. diesel engines including their capabilities and historical significance.
## Limitations
> [!WARNING]
> This model is experimental.
> While the model provides coherent and expert-like responses, users should verify its outputs for accuracy - especially in calculations or logical reasoning tasks.
- It is not optimized for conversational tasks but performs well in single-turn question answering.
- Inconsistent formatting for mathematical equations and LaTeX code.
- May have inaccurate data, make calculation errors or reasoning mistakes.
- Struggles with multiturn conversations and user alignment.
## Model Details
The model was created using the following Mergekit YAML configuration:
```yaml
models:
- model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- model: Skywork/Skywork-o1-Open-Llama-3.1-8B
- model: SimpleBerry/LLaMA-O1-Supervised-1129
- model: NousResearch/DeepHermes-3-Llama-3-8B-Preview
- model: O1-OPEN/OpenO1-LLama-8B-v0.1
- model: nvidia/Llama-3.1-Nemotron-Nano-8B-v1
merge_method: karcher
tokenizer:
source: meta-llama/Llama-3.1-8B-Instruct
dtype: bfloat16
```
Then the merged model was trained on [agentlans/train-of-thought](https://huggingface.co/datasets/agentlans/train-of-thought) `10K` subset for 1 epoch using LLaMA Factory.
- LoRA rank 8, alpha 16, dropout 0.5, use rsLoRA
- Pack sequences, NEFTune 5
- Liger kernel accelerator
## Licence
Llama 3.1 license