SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code Paper • 2506.05692 • Published Jun 6
DevBench: A Comprehensive Benchmark for Software Development Paper • 2403.08604 • Published Mar 13, 2024 • 2
CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering Paper • 2501.03447 • Published Jan 7
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling Paper • 2507.23370 • Published Jul 31
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published Aug 5 • 20
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning Paper • 2502.20127 • Published Feb 27 • 9