 JuanRafap
			's Collections
			JuanRafap
			's Collections
			
			
		Interés 
		
	updated
			
 
				
				
 - WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum
  Reinforcement Learning- 
			Paper
			 •- 
			2411.02337
			 •
			Published
				
			•- 
				36
			 
 - Mixture-of-Transformers: A Sparse and Scalable Architecture for
  Multi-Modal Foundation Models- 
			Paper
			 •- 
			2411.04996
			 •
			Published
				
			•- 
				51
			 
 - Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
  Grandmaster Level- 
			Paper
			 •- 
			2411.03562
			 •
			Published
				
			•- 
				67
			 
 - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
  Inference-time Hybrid Information Structurization- 
			Paper
			 •- 
			2410.08815
			 •
			Published
				
			•- 
				47
			 
 - Game-theoretic LLM: Agent Workflow for Negotiation Games- 
			Paper
			 •- 
			2411.05990
			 •
			Published
				
			•- 
				8
			 
 - BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
  Language Models on Mobile Devices- 
			Paper
			 •- 
			2411.10640
			 •
			Published
				
			•- 
				46
			 
 - Puzzle: Distillation-Based NAS for Inference-Optimized LLMs- 
			Paper
			 •- 
			2411.19146
			 •
			Published
				
			•- 
				17
			 
   - Snowflake/snowflake-arctic-embed-m-v2.0- 
			Sentence Similarity
			 • 
		
				0.3B
			• 
	
				Updated
					
				
				•- 
					83.1k
				
	
				 •- 
					90
				 
   - Snowflake/snowflake-arctic-embed-l-v2.0- 
			Sentence Similarity
			 • 
		
				0.6B
			• 
	
				Updated
					
				
				•- 
					335k
				 • 
			
	
				•- 
					206
				 
 - EXAONE 3.5: Series of Large Language Models for Real-world Use Cases- 
			Paper
			 •- 
			2412.04862
			 •
			Published
				
			•- 
				50
			 
   - ruliad/deepthought-8b-llama-v0.01-alpha- 
			Text Generation
			 • 
		
				8B
			• 
	
				Updated
					
				
				•- 
					13
				
	
				 •- 
					146
				 
 - Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's
  Reasoning Capability- 
			Paper
			 •- 
			2411.19943
			 •
			Published
				
			•- 
				63
			 
 - OCR Hinders RAG: Evaluating the Cascading Impact of OCR on
  Retrieval-Augmented Generation- 
			Paper
			 •- 
			2412.02592
			 •
			Published
				
			•- 
				24
			 
 - RL Zero: Zero-Shot Language to Behaviors without any Supervision- 
			Paper
			 •- 
			2412.05718
			 •
			Published
				
			•- 
				5
			 
 - VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal
  Retrieval-Augmented Generation- 
			Paper
			 •- 
			2412.10704
			 •
			Published
				
			•- 
				16
			 
 - RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
  Generation for Preference Alignment- 
			Paper
			 •- 
			2412.13746
			 •
			Published
				
			•- 
				9
			 
 - Wonderful Matrices: Combining for a More Efficient and Effective
  Foundation Model Architecture- 
			Paper
			 •- 
			2412.11834
			 •
			Published
				
			•- 
				8
			 
 - Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation
  Model Internet Agents- 
			Paper
			 •- 
			2412.13194
			 •
			Published
				
			•- 
				12
			 
 - ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing- 
			Paper
			 •- 
			2412.14711
			 •
			Published
				
			•- 
				16
			 
 - Ensembling Large Language Models with Process Reward-Guided Tree Search
  for Better Complex Reasoning- 
			Paper
			 •- 
			2412.15797
			 •
			Published
				
			•- 
				18
			 
 - Progressive Multimodal Reasoning via Active Retrieval- 
			Paper
			 •- 
			2412.14835
			 •
			Published
				
			•- 
				73
			 
 - MixLLM: LLM Quantization with Global Mixed-precision between
  Output-features and Highly-efficient System Design- 
			Paper
			 •- 
			2412.14590
			 •
			Published
				
			•- 
				14
			 
 - Learned Compression for Compressed Learning- 
			Paper
			 •- 
			2412.09405
			 •
			Published
				
			•- 
				13
			 
 - Token-Budget-Aware LLM Reasoning- 
			Paper
			 •- 
			2412.18547
			 •
			Published
				
			•- 
				46
			 
   - ericsonwillians/distilbert-base-uncased-steam-sentiment- 
			Text Classification
			 • 
		
				67M
			• 
	
				Updated
					
				
				•- 
					8
				
	
				
				 
 - Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
  Collective Monte Carlo Tree Search- 
			Paper
			 •- 
			2412.18319
			 •
			Published
				
			•- 
				39
			 
 - Personalized Graph-Based Retrieval for Large Language Models- 
			Paper
			 •- 
			2501.02157
			 •
			Published
				
			•- 
				31
			 
 - HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs- 
			Paper
			 •- 
			2412.18925
			 •
			Published
				
			•- 
				104
			 
 - Multi-task retriever fine-tuning for domain-specific and efficient RAG- 
			Paper
			 •- 
			2501.04652
			 •
			Published
				
			•- 
				10
			 
 - Search-o1: Agentic Search-Enhanced Large Reasoning Models- 
			Paper
			 •- 
			2501.05366
			 •
			Published
				
			•- 
				102
			 
 - DepthMaster: Taming Diffusion Models for Monocular Depth Estimation- 
			Paper
			 •- 
			2501.02576
			 •
			Published
				
			•- 
				15
			 
 - REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
  Models- 
			Paper
			 •- 
			2501.03262
			 •
			Published
				
			•- 
				102
			 
 - BoostStep: Boosting mathematical capability of Large Language Models via
  improved single-step reasoning- 
			Paper
			 •- 
			2501.03226
			 •
			Published
				
			•- 
				44
			 
 - Evolving Deeper LLM Thinking- 
			Paper
			 •- 
			2501.09891
			 •
			Published
				
			•- 
				115
			 
 - Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
  Large Language Models- 
			Paper
			 •- 
			2501.09686
			 •
			Published
				
			•- 
				41
			 
 - RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation- 
			Paper
			 •- 
			2501.08617
			 •
			Published
				
			•- 
				10
			 
 - The Lessons of Developing Process Reward Models in Mathematical
  Reasoning- 
			Paper
			 •- 
			2501.07301
			 •
			Published
				
			•- 
				99
			 
 - Multimodal LLMs Can Reason about Aesthetics in Zero-Shot- 
			Paper
			 •- 
			2501.09012
			 •
			Published
				
			•- 
				10
			 
 - ChemAgent: Self-updating Library in Large Language Models Improves
  Chemical Reasoning- 
			Paper
			 •- 
			2501.06590
			 •
			Published
				
			•- 
				11
			 
 - CodeElo: Benchmarking Competition-level Code Generation of LLMs with
  Human-comparable Elo Ratings- 
			Paper
			 •- 
			2501.01257
			 •
			Published
				
			•- 
				52
			 
 - Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
  Feedback- 
			Paper
			 •- 
			2501.10799
			 •
			Published
				
			•- 
				15
			 
 - Control LLM: Controlled Evolution for Intelligence Retention in LLM- 
			Paper
			 •- 
			2501.10979
			 •
			Published
				
			•- 
				6
			 
 - Autonomy-of-Experts Models- 
			Paper
			 •- 
			2501.13074
			 •
			Published
				
			•- 
				44
			 
 - Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
  Language Models via a Multi-Paradigm Perspective- 
			Paper
			 •- 
			2501.11110
			 •
			Published
				
			•- 
				4
			 
 - Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning- 
			Paper
			 •- 
			2412.09078
			 •
			Published
 - LLM2: Let Large Language Models Harness System 2 Reasoning- 
			Paper
			 •- 
			2412.20372
			 •
			Published
 - TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
  Internalization with Self-Reflection- 
			Paper
			 •- 
			2412.08024
			 •
			Published
				
			•- 
				1
			 
 - Table as Thought: Exploring Structured Thoughts in LLM Reasoning- 
			Paper
			 •- 
			2501.02152
			 •
			Published
 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
  Reinforcement Learning- 
			Paper
			 •- 
			2501.12948
			 •
			Published
				
			•- 
				420
			 
 - Self-supervised Quantized Representation for Seamlessly Integrating
  Knowledge Graphs with Large Language Models- 
			Paper
			 •- 
			2501.18119
			 •
			Published
				
			•- 
				25
			 
 - Preference Leakage: A Contamination Problem in LLM-as-a-judge- 
			Paper
			 •- 
			2502.01534
			 •
			Published
				
			•- 
				40
			 
 - The Differences Between Direct Alignment Algorithms are a Blur- 
			Paper
			 •- 
			2502.01237
			 •
			Published
				
			•- 
				113
			 
 - SRMT: Shared Memory for Multi-agent Lifelong Pathfinding- 
			Paper
			 •- 
			2501.13200
			 •
			Published
				
			•- 
				68
			 
 - The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
  Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles- 
			Paper
			 •- 
			2502.01081
			 •
			Published
				
			•- 
				14
			 
 - CODESIM: Multi-Agent Code Generation and Problem Solving through
  Simulation-Driven Planning and Debugging- 
			Paper
			 •- 
			2502.05664
			 •
			Published
				
			•- 
				24
			 
 - Training Language Models for Social Deduction with Multi-Agent
  Reinforcement Learning- 
			Paper
			 •- 
			2502.06060
			 •
			Published
				
			•- 
				38
			 
 - 
			Paper
			 •- 
			2502.06049
			 •
			Published
				
			•- 
				30
			 
 - Exploring the Limit of Outcome Reward for Learning Mathematical
  Reasoning- 
			Paper
			 •- 
			2502.06781
			 •
			Published
				
			•- 
				59
			 
 - Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
  Scaling- 
			Paper
			 •- 
			2502.06703
			 •
			Published
				
			•- 
				153
			 
 - CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference- 
			Paper
			 •- 
			2502.04416
			 •
			Published
				
			•- 
				12
			 
 - Goku: Flow Based Video Generative Foundation Models- 
			Paper
			 •- 
			2502.04896
			 •
			Published
				
			•- 
				106
			 
 - In-Context Retrieval-Augmented Language Models- 
			Paper
			 •- 
			2302.00083
			 •
			Published
				
			•- 
				1
			 
 - CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
  Models- 
			Paper
			 •- 
			2502.16614
			 •
			Published
				
			•- 
				27
			 
 - Can Large Language Models Detect Errors in Long Chain-of-Thought
  Reasoning?- 
			Paper
			 •- 
			2502.19361
			 •
			Published
				
			•- 
				28
			 
 - STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task
  Planning- 
			Paper
			 •- 
			2502.10177
			 •
			Published
				
			•- 
				6
			 
 - Goedel-Prover: A Frontier Model for Open-Source Automated Theorem
  Proving- 
			Paper
			 •- 
			2502.07640
			 •
			Published
				
			•- 
				10
			 
 - LoRACode: LoRA Adapters for Code Embeddings- 
			Paper
			 •- 
			2503.05315
			 •
			Published
				
			•- 
				13
			 
 - Learning from Failures in Multi-Attempt Reinforcement Learning- 
			Paper
			 •- 
			2503.04808
			 •
			Published
				
			•- 
				18
			 
   - ds4sd/SmolDocling-256M-preview- 
			Image-Text-to-Text
			 • 
		
				0.3B
			• 
	
				Updated
					
				
				•- 
					332k
				
	
				 •- 
					1.59k
				 
 - Bridging Continuous and Discrete Tokens for Autoregressive Visual
  Generation- 
			Paper
			 •- 
			2503.16430
			 •
			Published
				
			•- 
				34
			 
 - MAPS: A Multi-Agent Framework Based on Big Seven Personality and
  Socratic Guidance for Multimodal Scientific Problem Solving- 
			Paper
			 •- 
			2503.16905
			 •
			Published
				
			•- 
				54
			 
 - Improving Autoregressive Image Generation through Coarse-to-Fine Token
  Prediction- 
			Paper
			 •- 
			2503.16194
			 •
			Published
				
			•- 
				8
			 
 - ELTEX: A Framework for Domain-Driven Synthetic Data Generation- 
			Paper
			 •- 
			2503.15055
			 •
			Published
				
			•- 
				6
			 
 - Reinforcement Learning for Reasoning in Small LLMs: What Works and What
  Doesn't- 
			Paper
			 •- 
			2503.16219
			 •
			Published
				
			•- 
				52
			 
 - CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners- 
			Paper
			 •- 
			2503.16356
			 •
			Published
				
			•- 
				15
			 
 - DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement
  Learning- 
			Paper
			 •- 
			2503.15265
			 •
			Published
				
			•- 
				46
			 
 - DAPO: An Open-Source LLM Reinforcement Learning System at Scale- 
			Paper
			 •- 
			2503.14476
			 •
			Published
				
			•- 
				141
			 
 - Advancing Language Model Reasoning through Reinforcement Learning and
  Inference Scaling- 
			Paper
			 •- 
			2501.11651
			 •
			Published
				
			•- 
				1
			 
 - API Agents vs. GUI Agents: Divergence and Convergence- 
			Paper
			 •- 
			2503.11069
			 •
			Published
				
			•- 
				37
			 
 - R1-VL: Learning to Reason with Multimodal Large Language Models via
  Step-wise Group Relative Policy Optimization- 
			Paper
			 •- 
			2503.12937
			 •
			Published
				
			•- 
				30
			 
 - Self-Evolved Preference Optimization for Enhancing Mathematical
  Reasoning in Small Language Models- 
			Paper
			 •- 
			2503.04813
			 •
			Published
				
			•- 
				1
			 
 - Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise
  Rewards for Mathematical Reasoning- 
			Paper
			 •- 
			2502.14356
			 •
			Published
 - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
  Reasoning via Autoregressive Search- 
			Paper
			 •- 
			2502.02508
			 •
			Published
				
			•- 
				23
			 
   - NousResearch/DeepHermes-3-Mistral-24B-Preview- 
			Text Generation
			 • 
		
				24B
			• 
	
				Updated
					
				
				•- 
					1.34k
				
	
				 •- 
					117
				 
 - GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
  VLM Agent Training- 
			Paper
			 •- 
			2503.08525
			 •
			Published
				
			•- 
				17
			 
 - GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
  for Visual Generation and Editing- 
			Paper
			 •- 
			2503.10639
			 •
			Published
				
			•- 
				53
			 
 - Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
  Thinking- 
			Paper
			 •- 
			2503.19855
			 •
			Published
				
			•- 
				29
			 
 - A Probabilistic Inference Approach to Inference-Time Scaling of LLMs
  using Particle-Based Monte Carlo Methods- 
			Paper
			 •- 
			2502.01618
			 •
			Published
				
			•- 
				10
			 
 - Transformer^2: Self-adaptive LLMs- 
			Paper
			 •- 
			2501.06252
			 •
			Published
				
			•- 
				54
			 
 - MiniMax-01: Scaling Foundation Models with Lightning Attention- 
			Paper
			 •- 
			2501.08313
			 •
			Published
				
			•- 
				298
			 
 - ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language
  Model Born from Transformer- 
			Paper
			 •- 
			2501.15570
			 •
			Published
				
			•- 
				25
			 
 - 
				- open-thoughts/OpenThoughts-114k- 
			Viewer
			 • 
	
				Updated
					
				• 
			
			228k
	
				•- 
					42.5k
				
				 •- 
					766
				 
 
 - Beyond Prompt Content: Enhancing LLM Performance via Content-Format
  Integrated Prompt Optimization- 
			Paper
			 •- 
			2502.04295
			 •
			Published
				
			•- 
				13
			 
 - ScholarCopilot: Training Large Language Models for Academic Writing with
  Accurate Citations- 
			Paper
			 •- 
			2504.00824
			 •
			Published
				
			•- 
				43
			 
 - ZClip: Adaptive Spike Mitigation for LLM Pre-Training- 
			Paper
			 •- 
			2504.02507
			 •
			Published
				
			•- 
				88
			 
   - agentica-org/DeepCoder-14B-Preview- 
			Text Generation
			 • 
		
				15B
			• 
	
				Updated
					
				
				•- 
					876
				 • 
			
	
				•- 
					679
				 
 - Hogwild! Inference: Parallel LLM Generation via Concurrent Attention- 
			Paper
			 •- 
			2504.06261
			 •
			Published
				
			•- 
				110
			 
 - VAPO: Efficient and Reliable Reinforcement Learning for Advanced
  Reasoning Tasks- 
			Paper
			 •- 
			2504.05118
			 •
			Published
				
			•- 
				26
			 
 - DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning- 
			Paper
			 •- 
			2504.07128
			 •
			Published
				
			•- 
				86
			 
 - SQL-R1: Training Natural Language to SQL Reasoning Model By
  Reinforcement Learning- 
			Paper
			 •- 
			2504.08600
			 •
			Published
				
			•- 
				31
			 
 - APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
  Agent-Human Interplay- 
			Paper
			 •- 
			2504.03601
			 •
			Published
				
			•- 
				17
			 
 - GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
  Autoregressive Image Generation- 
			Paper
			 •- 
			2504.08736
			 •
			Published
				
			•- 
				46
			 
 - A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
  Reinforce- 
			Paper
			 •- 
			2504.11343
			 •
			Published
				
			•- 
				19
			 
 - DeepRAG: Thinking to Retrieval Step by Step for Large Language Models- 
			Paper
			 •- 
			2502.01142
			 •
			Published
				
			•- 
				24
			 
 - Genius: A Generalizable and Purely Unsupervised Self-Training Framework
  For Advanced Reasoning- 
			Paper
			 •- 
			2504.08672
			 •
			Published
				
			•- 
				55
			 
 - SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question
  Answering?- 
			Paper
			 •- 
			2502.13233
			 •
			Published
				
			•- 
				15
			 
 - LongPO: Long Context Self-Evolution of Large Language Models through
  Short-to-Long Preference Optimization- 
			Paper
			 •- 
			2502.13922
			 •
			Published
				
			•- 
				28
			 
 - WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation- 
			Paper
			 •- 
			2502.08047
			 •
			Published
				
			•- 
				28
			 
 - MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
  Reinforcement Learning- 
			Paper
			 •- 
			2503.07365
			 •
			Published
				
			•- 
				61
			 
 - Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
  Reasoners With Verifiers- 
			Paper
			 •- 
			2505.04842
			 •
			Published
				
			•- 
				12
			 
 - Benchmarking LLMs' Swarm intelligence- 
			Paper
			 •- 
			2505.04364
			 •
			Published
				
			•- 
				20
			 
 - Are Reasoning Models More Prone to Hallucination?- 
			Paper
			 •- 
			2505.23646
			 •
			Published
				
			•- 
				24
			 
 - Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data
  Could Be Secretly Stolen!- 
			Paper
			 •- 
			2505.15656
			 •
			Published
				
			•- 
				15
			 
 - This Time is Different: An Observability Perspective on Time Series
  Foundation Models- 
			Paper
			 •- 
			2505.14766
			 •
			Published
				
			•- 
				40
			 
 - ReCIT: Reconstructing Full Private Data from Gradient in
  Parameter-Efficient Fine-Tuning of Large Language Models- 
			Paper
			 •- 
			2504.20570
			 •
			Published
 - Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning
  Capabilities Through Evaluation Design- 
			Paper
			 •- 
			2506.04734
			 •
			Published
				
			•- 
				20
			 
 - Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive
  Foundations for Artificial General Intelligence and its Societal Impact- 
			Paper
			 •- 
			2507.00951
			 •
			Published
				
			•- 
				24
			 
 - Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM
  Fine-Tuning Data from Unstructured Documents- 
			Paper
			 •- 
			2507.04009
			 •
			Published
				
			•- 
				48
			 
 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning
  Due to Data Contamination- 
			Paper
			 •- 
			2507.10532
			 •
			Published
				
			•- 
				88
			 
 - LayerCake: Token-Aware Contrastive Decoding within Large Language Model
  Layers- 
			Paper
			 •- 
			2507.04404
			 •
			Published
				
			•- 
				21
			 
 - IntFold: A Controllable Foundation Model for General and Specialized
  Biomolecular Structure Prediction- 
			Paper
			 •- 
			2507.02025
			 •
			Published
				
			•- 
				35
			 
 - Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos
  with Spatio-Temporal Diffusion Models- 
			Paper
			 •- 
			2507.13344
			 •
			Published
				
			•- 
				56
			 
 - Speed Always Wins: A Survey on Efficient Architectures for Large
  Language Models- 
			Paper
			 •- 
			2508.09834
			 •
			Published
				
			•- 
				53
			 
 - Test-Time Scaling in Reasoning Models Is Not Effective for
  Knowledge-Intensive Tasks Yet- 
			Paper
			 •- 
			2509.06861
			 •
			Published
				
			•- 
				8
			 
 - Locality in Image Diffusion Models Emerges from Data Statistics- 
			Paper
			 •- 
			2509.09672
			 •
			Published
				
			•- 
				12
			 
 - Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from
  Token and Parameter Levels- 
			Paper
			 •- 
			2509.16596
			 •
			Published
				
			•- 
				13
			 
 - Self-Improvement in Multimodal Large Language Models: A Survey- 
			Paper
			 •- 
			2510.02665
			 •
			Published
				
			•- 
				19