TreePO - a m-a-p Collection

m-a-p 's Collections

REER-DeepWriter

TreePO

Hybrid Linear Attention Research

MARBLE

COIG-P-Datasets

YuE

MERT

MuPT

COIG

OpenCodeInterpreter

M-A-P Full Paper List

Amber-Reproduce-Intermediate-CKPTs (The Fine Line)

OpenLLaMA-Reproduce-Intermediate-CKPTs (The Fine Line)

Chinese Tiny LLM

TreePO

updated Sep 4, 2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80
m-a-p/TreePO-Qwen2.5-7B

Text Generation • 8B • Updated Oct 8, 2025 • 25 • 2
m-a-p/TreePO_data

Viewer • Updated Oct 8, 2025 • 3.12k • 188
m-a-p/TreePO-Qwen2.5-7B_fixed-div

8B • Updated Aug 31, 2025 • 13
m-a-p/TreePO-Qwen2.5-7B_GRPO-TreePO-Sampling

8B • Updated Sep 4, 2025 • 17
m-a-p/TreePO-Qwen2.5-7B_Low_Prob_Encourage

8B • Updated Sep 4, 2025 • 17
m-a-p/TreePO-Qwen2.5-7B_Naive2Low_Scheduler

8B • Updated Sep 4, 2025 • 15