GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Paper • 2503.14734 • Published Mar 18 • 4
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 140
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19 • 88
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24 • 40
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25 • 30
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving Paper • 2507.17596 • Published Jul 23 • 5
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement Paper • 2507.18742 • Published Jul 24 • 5
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Paper • 2507.10510 • Published Jul 14 • 4
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published Jul 25 • 28
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report Paper • 2507.16534 • Published Jul 22 • 7
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 257
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 236
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published Jul 8 • 118
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 131
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 12
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published Jun 13 • 23
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents Paper • 2505.22954 • Published May 29 • 14
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis Paper • 2505.11581 • Published May 16 • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 126
Gorilla: Large Language Model Connected with Massive APIs Paper • 2305.15334 • Published May 24, 2023 • 5
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace Paper • 2303.17580 • Published Mar 30, 2023 • 13
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework Paper • 2308.08155 • Published Aug 16, 2023 • 10
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 34
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published 22 days ago • 94
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 185
BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues Paper • 2501.10836 • Published Jan 18 • 1