Model Overview

DLER-Llama-Nemotron-8B-Merge
๐Ÿš€ The leading efficient reasoning model for cutting-edge research and development ๐ŸŒŸ

Paper Code Model Website

Comparison between Llama-3.1-Nemotron-Nano-8B-v1 and DLER-Llama-Nemotron-8B-Merge

Description:

DLER-Llama-3.1-Nemotron-8B is an ultra-efficient 8B open-weight reasoning model designed for challenging tasks such as mathematics, programming, and scientific problem-solving. It is first trained with the DLER algorithm on agentica-org/DeepScaleR-Preview-Dataset and then enhanced using a weight-merging technique to merge with the base model to mitigate accuracy degradation. Compared to the Llama-3.1-Nemotron-8B model, DLER-Llama-Nemotron-8B-Merge achieves substantial efficiency gains, reducing the average response length by nearly 50% across diverse mathematical benchmarks without sacrificing accuracy.

This model is for research and development only.

Evaluation Results:

Model MATH Length AIME Length AMC Length Minerva Length Olympiad Length Total Avg Length
Llama-3.1-Nemotron-Nano-8B-v1 95.4 3069 66.4 9899 88.25 6228 52.38 4031 64.33 6755 5996
DLER-Llama-Nemotron-8B-Merge 95.2 1995 66.7 5013 89.23 3358 53.19 2301 65.39 3520 3237 (-46%)

Environment Setup

pip install transformers==4.51.3

Inference:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


tokenizer = AutoTokenizer.from_pretrained("nvidia/DLER-Llama-Nemotron-8B-Merge-Research")
model = AutoModelForCausalLM.from_pretrained("nvidia/DLER-Llama-Nemotron-8B-Merge-Research").to(device)


messages = [{"role": "system", "content": "detailed thinking on"}, {"role": "user", "content": "Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \\boxed{}.\nQuestion: Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"}]

tokenized_chat = tokenizer.apply_chat_template(
   messages,
   tokenize=True,
   add_generation_prompt=True,
   return_tensors="pt").to(model.device)


outputs = model.generate(
   tokenized_chat,
   max_new_tokens=10000,
   eos_token_id=tokenizer.eos_token_id)


print(tokenizer.decode(outputs[0], skip_special_tokens=False))

License/Terms of Use

NSCLv1

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Citation

If you find our model helpful, please cite the following paper:

@article{liu2025dler,
  title={DLER: Doing Length pEnalty Right-Incentivizing More Intelligence per Token via Reinforcement Learning},
  author={Liu, Shih-Yang and Dong, Xin and Lu, Ximing and Diao, Shizhe and Liu, Mingjie and Chen, Min-Hung and Yin, Hongxu and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Choi, Yejin and others},
  journal={arXiv preprint arXiv:2510.15110},
  year={2025}
}
Downloads last month
149
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nvidia/DLER-Llama-Nemotron-8B-Merge-Research

Finetuned
(6)
this model
Quantizations
3 models

Dataset used to train nvidia/DLER-Llama-Nemotron-8B-Merge-Research

Collection including nvidia/DLER-Llama-Nemotron-8B-Merge-Research