stjefan2
's Collections
Articles
updated
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper
•
2311.00176
•
Published
•
9
Language Models can be Logical Solvers
Paper
•
2311.06158
•
Published
•
23
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal
Language Models
Paper
•
2311.05997
•
Published
•
37
Lumos: Learning Agents with Unified Data, Modular Design, and
Open-Source LLMs
Paper
•
2311.05657
•
Published
•
32
JaxMARL: Multi-Agent RL Environments in JAX
Paper
•
2311.10090
•
Published
•
8
ML-Bench: Large Language Models Leverage Open-source Libraries for
Machine Learning Tasks
Paper
•
2311.09835
•
Published
•
11
Large Language Models for Mathematicians
Paper
•
2312.04556
•
Published
•
13
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Paper
•
2312.11370
•
Published
•
20
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper
•
2401.00935
•
Published
•
18
Teaching Large Language Models to Reason with Reinforcement Learning
Paper
•
2403.04642
•
Published
•
50
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
71
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
285
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
115
AgentRxiv: Towards Collaborative Autonomous Research
Paper
•
2503.18102
•
Published
•
25
TTRL: Test-Time Reinforcement Learning
Paper
•
2504.16084
•
Published
•
120
Learning Adaptive Parallel Reasoning with Language Models
Paper
•
2504.15466
•
Published
•
43
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
•
2504.16078
•
Published
•
21
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
•
2504.15257
•
Published
•
47
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
•
2504.17192
•
Published
•
120
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical
Reasoning Models with OpenMathReasoning dataset
Paper
•
2504.16891
•
Published
•
25
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Paper
•
2504.16656
•
Published
•
57
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
•
2505.05470
•
Published
•
85
Measuring General Intelligence with Generated Games
Paper
•
2505.07215
•
Published
•
11
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
•
2505.19914
•
Published
•
43
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive
Programming?
Paper
•
2506.11928
•
Published
•
23
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just
Like an Olympiad Team
Paper
•
2506.14234
•
Published
•
41
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm
Engineering
Paper
•
2506.09050
•
Published
•
6
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
•
2506.15211
•
Published
•
37
SwarmAgentic: Towards Fully Automated Agentic System Generation via
Swarm Intelligence
Paper
•
2506.15672
•
Published
•
15
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop
System from Hypothesis to Verification
Paper
•
2505.16938
•
Published
•
120
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code
Generation
Paper
•
2506.20639
•
Published
•
31
Inverse Reinforcement Learning Meets Large Language Model Post-Training:
Basics, Advances, and Opportunities
Paper
•
2507.13158
•
Published
•
24
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement
Learning
Paper
•
2507.14111
•
Published
•
23
Speed Always Wins: A Survey on Efficient Architectures for Large
Language Models
Paper
•
2508.09834
•
Published
•
53
Deep Think with Confidence
Paper
•
2508.15260
•
Published
•
87
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
•
2509.07980
•
Published
•
98
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
•
2509.06949
•
Published
•
56
Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI
Agents
Paper
•
2509.06917
•
Published
•
41
The Majority is not always right: RL training for solution aggregation
Paper
•
2509.06870
•
Published
•
16
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering
Tasks?
Paper
•
2509.16941
•
Published
•
20
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon
Scenarios
Paper
•
2509.21766
•
Published
•
23
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
446
MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline
Paper
•
2510.07307
•
Published
•
5
Parallel Test-Time Scaling for Latent Reasoning Models
Paper
•
2510.07745
•
Published
•
5
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Paper
•
2510.14943
•
Published
•
37