liuheng

heng

liuheng2cqupt

AI & ML interests

None yet

Recent Activity

liked a dataset 27 days ago

AIDC-AI/HalloMTBench

liked a model 27 days ago

deepseek-ai/DeepSeek-V3.2-Speciale

liked a model 27 days ago

deepseek-ai/DeepSeek-V3.2

View all activity

Organizations

None yet

upvoted 3 articles 3 months ago

Article

KV Cache from scratch in nanoVLM

Jun 4

•

106

Article

🕳️ Attention Sinks in LLMs for endless fluency

Oct 9, 2023

•

Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

Apr 4

•

upvoted an article 4 months ago

Article

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

Jan 19

•

upvoted a collection 8 months ago

Qwen3

Collection

84 items • Updated Aug 6 • 1.52k

upvoted an article 9 months ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Mar 17

•

348

upvoted an article 11 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

•

1.31k

upvoted a paper 11 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123

upvoted a paper over 2 years ago

Extending Context Window of Large Language Models via Positional Interpolation

Paper • 2306.15595 • Published Jun 27, 2023 • 53

liuheng

AI & ML interests

Recent Activity

Organizations

heng's activity

KV Cache from scratch in nanoVLM

🕳️ Attention Sinks in LLMs for endless fluency

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Open-source DeepResearch – Freeing our search agents