-
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Paper • 2406.12624 • Published • 37 -
A Survey on LLM-as-a-Judge
Paper • 2411.15594 • Published -
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
Paper • 2412.05579 • Published • 2 -
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Paper • 2411.16594 • Published • 41
Lu Zhang
kaitou951
·
AI & ML interests
None yet
Organizations
None yet
Daily Papers
-
Robust Multimodal Large Language Models Against Modality Conflict
Paper • 2507.07151 • Published • 5 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 31 -
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 106 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40
A Survey on LLM-as-a-Judge
-
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Paper • 2406.12624 • Published • 37 -
A Survey on LLM-as-a-Judge
Paper • 2411.15594 • Published -
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
Paper • 2412.05579 • Published • 2 -
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Paper • 2411.16594 • Published • 41
Daily Papers
-
Robust Multimodal Large Language Models Against Modality Conflict
Paper • 2507.07151 • Published • 5 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 31 -
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 106 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40
models
0
None public yet
datasets
0
None public yet