BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 34
view article Article BigCodeArena: Judging code generations end to end with code executions By bigcode • 21 days ago • 16
view article Article Gaia2 Leaderboard Update: New Models and New Observations By meta-agents-research-environments and 3 others • 26 days ago • 9
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 112
view article Article Back to The Future: Evaluating AI Agents on Predicting Future Events Jul 17 • 44
view article Article Unlocking Healthcare AI: I'm Releasing State-of-the-Art Medical Models for Free. Forever. By MaziyarPanahi • Jul 16 • 143
view article Article Bringing Fusion Down to Earth: ML for Stellarator Optimization By cgeorgiaw • Jul 2 • 75
view article Article LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs By davidberenstein1957 and 3 others • Jul 2 • 16
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! Jun 6 • 54
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 84