zhongyuan peng's picture

3 12 3

zhongyuan peng

happzy2633

·

Happzy-WHU

AI & ML interests

None yet

Recent Activity

upvoted a paper 14 days ago

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

upvoted a paper 2 months ago

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

upvoted a paper 3 months ago

Efficient Agents: Building Effective Agents While Reducing Cost

View all activity

Organizations

authored a paper 3 months ago

KAT-V1: Kwai-AutoThink Technical Report

Paper • 2507.08297 • Published Jul 11 • 6

authored 7 papers 4 months ago

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

Paper • 2310.00746 • Published Oct 1, 2023 • 1

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Paper • 2410.11710 • Published Oct 15, 2024 • 20

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Paper • 2410.13639 • Published Oct 17, 2024 • 19

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 104

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Paper • 2502.19361 • Published Feb 26 • 28

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Paper • 2505.02735 • Published May 5 • 33

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43

authored a paper 6 months ago

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

Paper • 2504.15415 • Published Apr 21 • 22

authored a paper 8 months ago

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Paper • 2502.16614 • Published Feb 23 • 27