- 
	
	
	
A New Federated Learning Framework Against Gradient Inversion Attacks
Paper • 2412.07187 • Published • 3 - 
	
	
	
Selective Aggregation for Low-Rank Adaptation in Federated Learning
Paper • 2410.01463 • Published • 19 - 
	
	
	
Exploring Federated Pruning for Large Language Models
Paper • 2505.13547 • Published • 14 - 
	
	
	
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Paper • 2504.13173 • Published • 18 
Collections
Discover the best community collections!
Collections including paper arxiv:2501.00663 
						
					
				- 
	
	
	
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 - 
	
	
	
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Paper • 2309.17207 • Published - 
	
	
	
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 26 - 
	
	
	
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 
- 
	
	
	
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 22 - 
	
	
	
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 26 - 
	
	
	
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54 - 
	
	
	
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 
- 
	
	
	
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 9 - 
	
	
	
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 - 
	
	
	
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 22 - 
	
	
	
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34 
- 
	
	
	
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Paper • 2407.14507 • Published • 46 - 
	
	
	
Large Language Models are Zero-Shot Reasoners
Paper • 2205.11916 • Published • 3 - 
	
	
	
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11 - 
	
	
	
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 
- 
	
	
	
LM2: Large Memory Models
Paper • 2502.06049 • Published • 30 - 
	
	
	
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 26 - 
	
	
	
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 - 
	
	
	
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 37 
- 
	
	
	
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11 - 
	
	
	
LLM Critics Help Catch LLM Bugs
Paper • 2407.00215 • Published - 
	
	
	
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper • 2407.21787 • Published • 13 - 
	
	
	
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13 
- 
	
	
	
A New Federated Learning Framework Against Gradient Inversion Attacks
Paper • 2412.07187 • Published • 3 - 
	
	
	
Selective Aggregation for Low-Rank Adaptation in Federated Learning
Paper • 2410.01463 • Published • 19 - 
	
	
	
Exploring Federated Pruning for Large Language Models
Paper • 2505.13547 • Published • 14 - 
	
	
	
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Paper • 2504.13173 • Published • 18 
- 
	
	
	
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Paper • 2407.14507 • Published • 46 - 
	
	
	
Large Language Models are Zero-Shot Reasoners
Paper • 2205.11916 • Published • 3 - 
	
	
	
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11 - 
	
	
	
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 
- 
	
	
	
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 - 
	
	
	
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Paper • 2309.17207 • Published - 
	
	
	
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 26 - 
	
	
	
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 
- 
	
	
	
LM2: Large Memory Models
Paper • 2502.06049 • Published • 30 - 
	
	
	
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 26 - 
	
	
	
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 - 
	
	
	
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 37 
- 
	
	
	
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 22 - 
	
	
	
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 26 - 
	
	
	
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54 - 
	
	
	
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 
- 
	
	
	
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11 - 
	
	
	
LLM Critics Help Catch LLM Bugs
Paper • 2407.00215 • Published - 
	
	
	
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper • 2407.21787 • Published • 13 - 
	
	
	
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13 
- 
	
	
	
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 9 - 
	
	
	
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 - 
	
	
	
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 22 - 
	
	
	
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34