Open to Collab

Muhammad Umair

umair894

AI & ML interests

Multimodal Reidentification | Feature Upscaling | Cross-modal alignment | robust generalization | PhD UESTC

Recent Activity

upvoted a paper 3 days ago

MiniMax Sparse Attention

upvoted a paper 3 days ago

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

upvoted a paper 5 days ago

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

View all activity

Organizations

upvoted 2 papers 3 days ago

MiniMax Sparse Attention

Paper • 2606.13392 • Published 4 days ago • 119

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Paper • 2606.11926 • Published 5 days ago • 109

upvoted 3 papers 5 days ago

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Paper • 2606.07502 • Published 10 days ago • 91

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

Paper • 2606.10804 • Published 6 days ago • 41

ABot-Earth 0.5: Generative 3D Earth Model

Paper • 2606.09967 • Published 7 days ago • 463

liked a Space 10 days ago

RF-DETR Realtime Webcam Demo

🎯

Segment objects in live webcam and uploaded media

upvoted a paper 10 days ago

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

Paper • 2606.05112 • Published 12 days ago • 3

upvoted 4 papers 13 days ago

Multi-Agent Computer Use

Paper • 2606.01533 • Published 14 days ago • 7

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

Paper • 2606.01247 • Published 15 days ago • 30

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published 19 days ago • 25

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper • 2606.02437 • Published 14 days ago • 228

liked 2 Spaces 13 days ago

NV-Generate Synthetic Medical Imaging

🧠

Synthetic 3D CT and MR generation with NVIDIA NV-Generate.

LocateAnything

💬

266

Detect and label objects in images and videos

liked a model 14 days ago

nvidia/LocateAnything-3B

Image-Text-to-Text • 4B • Updated 3 days ago • 75.2k • 2k

liked 2 Spaces 15 days ago

LTX 2.3 Studio

🎬

243

Generate videos from text, images, audio, or video clips

Omni-Video-Factory-API-iframe

🐠

118

Access video creation tools via an embedded interface

liked a model 15 days ago

lintw/HealthGPT-Pro-4B

Image-Text-to-Image • 4B • Updated May 5 • 44 • 3

upvoted 3 papers 16 days ago

Learning A Unified Risk Map for Autonomous Driving in Partially Observable Environments

Paper • 2605.22189 • Published 25 days ago • 8

WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

Paper • 2605.29341 • Published 18 days ago • 18

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Paper • 2605.30161 • Published 18 days ago • 60