Model Overview
Description:
DLER-Llama-3.1-Nemotron-8B is an ultra-efficient 8B open-weight reasoning model designed for challenging tasks such as mathematics, programming, and scientific problem-solving. It is first trained with the DLER algorithm on agentica-org/DeepScaleR-Preview-Dataset and then enhanced using a weight-merging technique to merge with the base model to mitigate accuracy degradation. Compared to the Llama-3.1-Nemotron-8B model, DLER-Llama-Nemotron-8B-Merge achieves substantial efficiency gains, reducing the average response length by nearly 50% across diverse mathematical benchmarks without sacrificing accuracy.
This model is for research and development only.
Evaluation Results:
| Model | MATH | Length | AIME | Length | AMC | Length | Minerva | Length | Olympiad | Length | Total Avg Length |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Llama-3.1-Nemotron-Nano-8B-v1 | 95.4 | 3069 | 66.4 | 9899 | 88.25 | 6228 | 52.38 | 4031 | 64.33 | 6755 | 5996 |
| DLER-Llama-Nemotron-8B-Merge | 95.2 | 1995 | 66.7 | 5013 | 89.23 | 3358 | 53.19 | 2301 | 65.39 | 3520 | 3237 (-46%) |
Environment Setup
pip install transformers==4.51.3
Inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("nvidia/DLER-Llama-Nemotron-8B-Merge-Research")
model = AutoModelForCausalLM.from_pretrained("nvidia/DLER-Llama-Nemotron-8B-Merge-Research").to(device)
messages = [{"role": "system", "content": "detailed thinking on"}, {"role": "user", "content": "Below is a math question. I want you to reason through the steps and then give a final answer. Your final answer should be in \\boxed{}.\nQuestion: Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"}]
tokenized_chat = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt").to(model.device)
outputs = model.generate(
tokenized_chat,
max_new_tokens=10000,
eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
License/Terms of Use
NSCLv1
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.
Citation
If you find our model helpful, please cite the following paper:
@article{liu2025dler,
title={DLER: Doing Length pEnalty Right-Incentivizing More Intelligence per Token via Reinforcement Learning},
author={Liu, Shih-Yang and Dong, Xin and Lu, Ximing and Diao, Shizhe and Liu, Mingjie and Chen, Min-Hung and Yin, Hongxu and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Choi, Yejin and others},
journal={arXiv preprint arXiv:2510.15110},
year={2025}
}
- Downloads last month
- 149
