Model Overview

DLER-Llama-Nemotron-8B-Merge

🚀 The leading efficient reasoning model for cutting-edge research and development 🌟

Description:

DLER-Llama-3.1-Nemotron-8B is an ultra-efficient 8B open-weight reasoning model designed for challenging tasks such as mathematics, programming, and scientific problem-solving. It is first trained with the DLER algorithm on agentica-org/DeepScaleR-Preview-Dataset and then enhanced using a weight-merging technique to merge with the base model to mitigate accuracy degradation. Compared to the Llama-3.1-Nemotron-8B model, DLER-Llama-Nemotron-8B-Merge achieves substantial efficiency gains, reducing the average response length by nearly 50% across diverse mathematical benchmarks without sacrificing accuracy.

This model is for research and development only.

Evaluation Results:

Model	MATH	Length	AIME	Length	AMC	Length	Minerva	Length	Olympiad	Length	Total Avg Length
Llama-3.1-Nemotron-Nano-8B-v1	95.4	3069	66.4	9899	88.25	6228	52.38	4031	64.33	6755	5996
DLER-Llama-Nemotron-8B-Merge	95.2	1995	66.7	5013	89.23	3358	53.19	2301	65.39	3520	3237 (-46%)

Environment Setup

pip install transformers==4.51.3

Inference:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


tokenizer = AutoTokenizer.from_pretrained("nvidia/DLER-Llama-Nemotron-8B-Merge-Research")
model = AutoModelForCausalLM.from_pretrained("nvidia/DLER-Llama-Nemotron-8B-Merge-Research").to(device)


messages = [{"role": "system", "content": "detailed thinking on"}, {"role": "user", "content": "Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \\boxed{}.\nQuestion: Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"}]

tokenized_chat = tokenizer.apply_chat_template(
   messages,
   tokenize=True,
   add_generation_prompt=True,
   return_tensors="pt").to(model.device)


outputs = model.generate(
   tokenized_chat,
   max_new_tokens=10000,
   eos_token_id=tokenizer.eos_token_id)


print(tokenizer.decode(outputs[0], skip_special_tokens=False))

License/Terms of Use

NSCLv1

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Citation

If you find our model helpful, please cite the following paper:

@article{liu2025dler,
  title={DLER: Doing Length pEnalty Right-Incentivizing More Intelligence per Token via Reinforcement Learning},
  author={Liu, Shih-Yang and Dong, Xin and Lu, Ximing and Diao, Shizhe and Liu, Mingjie and Chen, Min-Hung and Yin, Hongxu and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Choi, Yejin and others},
  journal={arXiv preprint arXiv:2510.15110},
  year={2025}
}