MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published 29 days ago • 168
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Paper • 2505.23693 • Published May 29 • 55
PriM: Principle-Inspired Material Discovery through Multi-Agent Collaboration Paper • 2504.08810 • Published Apr 9 • 1