SWE-bench - a princeton-nlp Collection

princeton-nlp 's Collections

RLMT Experiments

SimPO

ProLong

SimCSE

SWE-bench

updated Mar 8

SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues.