Yuseung "Phillip" Lee's picture

Yuseung "Phillip" Lee

phillipinseoul

·

https://phillipinseoul.github.io/

phillipinseoul

AI & ML interests

Computer Vision

Recent Activity

liked a Space about 8 hours ago

lmms-lab-si/EASI-Leaderboard

liked a dataset about 9 hours ago

HuanjinYao/Mulberry-SFT

upvoted a paper 1 day ago

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

View all activity

Organizations

upvoted a paper 1 day ago

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

Paper • 2512.24385 • Published 4 days ago • 7

upvoted 4 papers 4 days ago

An Information Theoretic Perspective on Agentic System Design

Paper • 2512.21720 • Published 9 days ago • 7

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Paper • 2512.22615 • Published 7 days ago • 39

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Paper • 2512.22322 • Published 8 days ago • 35

Yume-1.5: A Text-Controlled Interactive World Generation Model

Paper • 2512.22096 • Published 8 days ago • 55

upvoted a paper 8 days ago

Latent Implicit Visual Reasoning

Paper • 2512.21218 • Published 10 days ago • 63

upvoted a paper 9 days ago

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Paper • 2512.20557 • Published 11 days ago • 48

upvoted 2 papers 10 days ago

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 16 days ago • 30

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Paper • 2512.20617 • Published 11 days ago • 42

upvoted 3 papers 12 days ago

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Paper • 2512.19678 • Published 12 days ago • 29

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Paper • 2512.10863 • Published 23 days ago • 21

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

Paper • 2512.16899 • Published 16 days ago • 12

upvoted 3 papers 13 days ago

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published 16 days ago • 72

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Paper • 2512.17008 • Published 16 days ago • 10

When Reasoning Meets Its Laws

Paper • 2512.17901 • Published 15 days ago • 54

upvoted 4 papers 15 days ago

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Paper • 2512.16918 • Published 16 days ago • 12

Next-Embedding Prediction Makes Strong Vision Learners

Paper • 2512.16922 • Published 16 days ago • 82

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Paper • 2512.16561 • Published 16 days ago • 19

Adaptation of Agentic AI

Paper • 2512.16301 • Published 16 days ago • 98

upvoted a paper 16 days ago

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Paper • 2512.15713 • Published 17 days ago • 16