Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.00663

A New Federated Learning Framework Against Gradient Inversion Attacks

Paper • 2412.07187 • Published Dec 10, 2024 • 3
Selective Aggregation for Low-Rank Adaptation in Federated Learning

Paper • 2410.01463 • Published Oct 2, 2024 • 19
Exploring Federated Pruning for Large Language Models

Paper • 2505.13547 • Published May 19 • 14
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Paper • 2504.13173 • Published Apr 17 • 18

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 72
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Paper • 2309.17207 • Published Sep 29, 2023
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26

foundational models

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 22
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9 • 54
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

advancing research

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 9
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 56
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 22
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 34

LLM and Reasoning Papers

Papers dump of LLM Reasoning domain

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Paper • 2407.14507 • Published Jul 19, 2024 • 46
Large Language Models are Zero-Shot Reasoners

Paper • 2205.11916 • Published May 24, 2022 • 3
Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 11
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper • 2201.11903 • Published Jan 28, 2022 • 14

LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 37

New Directions for AI

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 150
An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 6

LLM Reasoning Papers

improve reasoning capabilities of LLMs

Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 11
LLM Critics Help Catch LLM Bugs

Paper • 2407.00215 • Published Jun 28, 2024
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31, 2024 • 13
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

Paper • 2407.04363 • Published Jul 5, 2024 • 33
LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26

A New Federated Learning Framework Against Gradient Inversion Attacks

Paper • 2412.07187 • Published Dec 10, 2024 • 3
Selective Aggregation for Low-Rank Adaptation in Federated Learning

Paper • 2410.01463 • Published Oct 2, 2024 • 19
Exploring Federated Pruning for Large Language Models

Paper • 2505.13547 • Published May 19 • 14
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Paper • 2504.13173 • Published Apr 17 • 18

LLM and Reasoning Papers

Papers dump of LLM Reasoning domain

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Paper • 2407.14507 • Published Jul 19, 2024 • 46
Large Language Models are Zero-Shot Reasoners

Paper • 2205.11916 • Published May 24, 2022 • 3
Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 11
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper • 2201.11903 • Published Jan 28, 2022 • 14

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 72
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Paper • 2309.17207 • Published Sep 29, 2023
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 37

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26

New Directions for AI

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 150
An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 6

foundational models

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 22
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9 • 54
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

LLM Reasoning Papers

improve reasoning capabilities of LLMs

Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 11
LLM Critics Help Catch LLM Bugs

Paper • 2407.00215 • Published Jun 28, 2024
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31, 2024 • 13
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

advancing research

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 9
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 56
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 22
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 34

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

Paper • 2407.04363 • Published Jul 5, 2024 • 33
LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs