Improve model card: Add metadata, paper, code, and usage example
Browse filesThis PR significantly enhances the model card by:
- Linking the model to its official Hugging Face paper page at https://huggingface.co/papers/2507.03112.
- Adding the `text-generation` pipeline tag, which helps users discover the model at https://huggingface.co/models?pipeline_tag=text-generation.
- Specifying `transformers` as the compatible library, enabling the "Use in Transformers" widget.
- Including relevant `tags` such as `dialogue`, `empathy`, `reinforcement-learning`, and `qwen` for better discoverability.
- Providing a direct link to the GitHub repository.
- Incorporating the full paper abstract and a key framework image.
- Providing a comprehensive usage example for inference with the `transformers` library, including handling the Qwen2 chat template.
- Adding a BibTeX citation.
These changes aim to greatly improve the model's visibility, usability, and documentation for the community.
|
@@ -1,8 +1,94 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
license: other
|
| 3 |
license_name: license
|
| 4 |
license_link: LICENSE
|
| 5 |
-
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2.5-7B-Instruct
|
| 4 |
license: other
|
| 5 |
license_name: license
|
| 6 |
license_link: LICENSE
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
library_name: transformers
|
| 9 |
+
tags:
|
| 10 |
+
- dialogue
|
| 11 |
+
- empathy
|
| 12 |
+
- reinforcement-learning
|
| 13 |
+
- qwen
|
| 14 |
---
|
| 15 |
+
|
| 16 |
+
# RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
|
| 17 |
+
|
| 18 |
+
This repository contains the `Qwen2.5-7B-Instruct` model fine-tuned using RLVER, as presented in the paper [RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents](https://huggingface.co/papers/2507.03112).
|
| 19 |
+
|
| 20 |
+
<div align="center">
|
| 21 |
+
<a href="https://github.com/Tencent/digitalhuman/tree/main/RLVER"><img src="https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white" alt="GitHub"></a>
|
| 22 |
+
<a href="https://huggingface.co/papers/2507.03112"><img src="https://img.shields.io/badge/Paper-2507.03112-b31b1b.svg?style=for-the-badge" alt="arXiv"></a>
|
| 23 |
+
</div>
|
| 24 |
+
|
| 25 |
+
## Abstract
|
| 26 |
+
Large language models (LLMs) excel at logical and algorithmic reasoning, yet their emotional intelligence (EQ) still lags far behind their cognitive prowess. While reinforcement learning from verifiable rewards (RLVR) has advanced in other domains, its application to dialogue—especially for emotional intelligence—remains underexplored. In this work, we introduce RLVER, the first end-to-end reinforcement learning framework that leverages verifiable emotion rewards from simulated users to cultivate higher-order empathetic abilities in LLMs. Within this framework, self-consistent affective simulated users engage in dialogue rollouts and produce deterministic emotion scores during conversations, serving as reward signals to guide the LLM's learning. Fine-tuning publicly available Qwen2.5-7B-Instruct model with PPO boosts its Sentient-Benchmark score from 13.3 to 79.2 while largely preserving mathematical and coding competence. Extensive experiments reveal that: (i) RLVER consistently improves multiple dialogue capabilities; (ii) Thinking and non-thinking models show distinct trends—thinking models excel in empathy and insight, while non-thinking models favor action; (iii) GRPO often yields stable gains, while PPO can push certain capabilities to a higher ceiling; (iv) More challenging environments are not always better—moderate ones can yield stronger outcomes. Our results show that RLVER is a practical route toward emotionally intelligent and broadly capable language agents.
|
| 27 |
+
|
| 28 |
+
## Overview
|
| 29 |
+
<p align="center"><img width="90%" src="https://raw.githubusercontent.com/Tencent/digitalhuman/main/RLVER/code/figs/framework.png" /></p>
|
| 30 |
+
<p align="center"><em>The overview of RLVER. In this work, we present the first end-to-end reinforcement-learning framework that equips an LLM with human-level empathetic skills by optimizing against verifiable emotion rewards.</em></p>
|
| 31 |
+
|
| 32 |
+
## Usage
|
| 33 |
+
|
| 34 |
+
This model can be loaded and used with the `transformers` library. Ensure you have the library installed (`pip install transformers`).
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 38 |
+
import torch
|
| 39 |
+
|
| 40 |
+
# Load the model and tokenizer
|
| 41 |
+
model_name = "Tencent/RLVER-Qwen2.5-7B" # Replace with the actual model ID if a specific sub-model is desired
|
| 42 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 43 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 44 |
+
model_name,
|
| 45 |
+
torch_dtype=torch.bfloat16, # or torch.float16 depending on your GPU and model settings
|
| 46 |
+
device_map="auto"
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
# Example chat interaction following Qwen2.5 chat template
|
| 50 |
+
messages = [
|
| 51 |
+
{"role": "system", "content": "You are a helpful and empathetic assistant."},
|
| 52 |
+
{"role": "user", "content": "I'm feeling really down today, I lost my job and I don't know what to do."}
|
| 53 |
+
]
|
| 54 |
+
|
| 55 |
+
input_ids = tokenizer.apply_chat_template(
|
| 56 |
+
messages,
|
| 57 |
+
tokenize=True,
|
| 58 |
+
add_generation_prompt=True,
|
| 59 |
+
return_tensors="pt"
|
| 60 |
+
).to(model.device)
|
| 61 |
+
|
| 62 |
+
# Generate response
|
| 63 |
+
outputs = model.generate(
|
| 64 |
+
input_ids,
|
| 65 |
+
max_new_tokens=256,
|
| 66 |
+
do_sample=True,
|
| 67 |
+
temperature=0.7,
|
| 68 |
+
top_p=0.9,
|
| 69 |
+
pad_token_id=tokenizer.eos_token_id # Important for Qwen2.5
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 73 |
+
print(response)
|
| 74 |
+
|
| 75 |
+
# Expected output format will include the full conversation up to the model's response:
|
| 76 |
+
# <|im_start|>system
|
| 77 |
+
# You are a helpful and empathetic assistant.<|im_end|>
|
| 78 |
+
# <|im_start|>user
|
| 79 |
+
# I'm feeling really down today, I lost my job and I don't know what to do.<|im_end|>
|
| 80 |
+
# <|im_start|>assistant
|
| 81 |
+
# I'm so sorry to hear that. Losing your job can be incredibly tough, and it's completely understandable to feel down. Please know that you're not alone, and it's okay to feel this way. Take some time to process your emotions. Is there anything specific you'd like to talk about or any way I can help you right now?
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
## Citation
|
| 85 |
+
If you find our work helpful or inspiring, please feel free to cite it.
|
| 86 |
+
|
| 87 |
+
```bibtex
|
| 88 |
+
@article{zhou2024learning,
|
| 89 |
+
title={RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents},
|
| 90 |
+
author={Zhou, Zijian and Liu, Shikun and Han, Xiao and Liu, Haozhe and Ng, Kam Woh and Xie, Tian and Cong, Yuren and Li, Hang and Xu, Mengmeng and P{\'e}rez-R{\'u}a, Juan-Manuel and Patel, Aditya and Xiang, Tao and Shi, Miaojing and He, Sen},
|
| 91 |
+
journal={arXiv preprint arXiv:2507.03112},
|
| 92 |
+
year={2025},
|
| 93 |
+
}
|
| 94 |
+
```
|