--- base_model: - meta-llama/Llama-3.1-8B-Instruct language: - en tags: - problem-solving - chain-of-thought - reasoning license: llama3.1 datasets: - agentlans/train-of-thought --- # Llama3.1-deep-o1 - A robust merge of DeepSeek R1 distilled and O1-style long chain-of-thought (CoT) large language models (LLMs). - It can generate long, coherent solutions and excels at problem-solving tasks among models with 8 billion parameters. ## Model Overview - Supports long reasoning and non-reasoning modes. - In reasoning mode, generates the thought process for problem solving. - Suitable for creating solution outlines and analyzing problems - Or as a foundation model for finetuning and merging. > [!NOTE] > To use the long CoT reasoning mode, use a system prompt like > > `Explain your reasoning step-by-step using ..., then give the final answer inside ....` ### Examples to Try: - Write the equations for glycolysis and pyruvate oxidation. - Calculate net ATP formation from glucose metabolism (excluding electron transport chain). - Integrate x^2 e^x dx. - Prove that the complete bipartite graph K_{3,3} isn't planar. - Derive a formula for the critical angle between two media with refractive indices n_1 and n_2. - Compare steam vs. diesel engines including their capabilities and historical significance. ## Limitations > [!WARNING] > This model is experimental. > While the model provides coherent and expert-like responses, users should verify its outputs for accuracy - especially in calculations or logical reasoning tasks. - It is not optimized for conversational tasks but performs well in single-turn question answering. - Inconsistent formatting for mathematical equations and LaTeX code. - May have inaccurate data, make calculation errors or reasoning mistakes. - Struggles with multiturn conversations and user alignment. ## Model Details The model was created using the following Mergekit YAML configuration: ```yaml models: - model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B - model: Skywork/Skywork-o1-Open-Llama-3.1-8B - model: SimpleBerry/LLaMA-O1-Supervised-1129 - model: NousResearch/DeepHermes-3-Llama-3-8B-Preview - model: O1-OPEN/OpenO1-LLama-8B-v0.1 - model: nvidia/Llama-3.1-Nemotron-Nano-8B-v1 merge_method: karcher tokenizer: source: meta-llama/Llama-3.1-8B-Instruct dtype: bfloat16 ``` Then the merged model was trained on [agentlans/train-of-thought](https://huggingface.co/datasets/agentlans/train-of-thought) `10K` subset for 1 epoch using LLaMA Factory. - LoRA rank 8, alpha 16, dropout 0.5, use rsLoRA - Pack sequences, NEFTune 5 - Liger kernel accelerator ## Licence Llama 3.1 license