minhtt32 (Tan Minh Tran)

upvoted an article 2 months ago

Article

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

Jun 11

•

116

upvoted a collection 2 months ago

InternVL3.5

Collection

This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated Sep 28 • 103

upvoted a paper 3 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 207

upvoted a paper 4 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 316

upvoted an article 5 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Jun 3

•

284

upvoted 2 papers 7 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 302

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

upvoted 2 papers 8 months ago

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Paper • 2503.12533 • Published Mar 16 • 68

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

upvoted 7 papers 9 months ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 74

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Paper • 2502.20395 • Published Feb 27 • 46

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Paper • 2502.19634 • Published Feb 26 • 63

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 84

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 89

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7 • 57

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 123

upvoted 2 articles 10 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

•

886

Article

We now support VLMs in smolagents!

Jan 24

•

110

upvoted 2 papers 11 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

Tan Minh Tran

AI & ML interests

Organizations

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

InternVL3.5

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Qwen3 Technical Report

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Kimi-VL Technical Report

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Visual-RFT: Visual Reinforcement Fine-Tuning

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Unified Reward Model for Multimodal Understanding and Generation

Open-R1: a fully open reproduction of DeepSeek-R1

We now support VLMs in smolagents!

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Qwen2.5 Technical Report

Tan Minh Tran

AI & ML interests

Organizations

minhtt32's activity

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Open-R1: a fully open reproduction of DeepSeek-R1

We now support VLMs in smolagents!