LMMs-Lab

community

https://www.lmms-lab.com/

lmmslab

EvolvingLMMs-Lab

AI & ML interests

Feeling and building the multimodal intelligence.

Recent Activity

xiangan authored a paper about 12 hours ago

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

xiangan authored a paper about 12 hours ago

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

luodian updated a Space about 15 hours ago

lmms-lab/README

View all activity

Papers

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

View all Papers

xiangan

authored 2 papers about 12 hours ago

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

Paper • 2510.18795 • Published Oct 21, 2025 • 11

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Paper • 2601.10305 • Published 28 days ago • 36

luodian

updated a Space about 15 hours ago

README

xiangan

updated a collection 2 days ago

OneVision-Encoder

HEVC-Style Vision Transformer • 2 items • Updated 2 days ago • 2

yiyexy

updated a model 5 days ago

lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

Image-Text-to-Text • 5B • Updated 5 days ago • 2.55k • 16

Yuwei-Niu

authored a paper 15 days ago

iFSQ: Improving FSQ for Image Generation with 1 Line of Code

Paper • 2601.17124 • Published 19 days ago • 32

luodian

published a dataset 20 days ago

lmms-lab/Spatial457

Updated 20 days ago • 10

kcz358

in lmms-lab/sae-sample-cache-dataset 27 days ago

Why there are so many duplicated images?

#1 opened 29 days ago by

luodian

updated a collection about 2 months ago

OneVision-Encoder

HEVC-Style Vision Transformer • 2 items • Updated 2 days ago • 2

Paranioar

authored a paper about 2 months ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 65

mwxely

authored a paper about 2 months ago

A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models

Paper • 2511.15098 • Published Nov 19, 2025

kcz358

updated a collection 2 months ago

LongVT

8 items • Updated Dec 11, 2025 • 8

kcz358

updated a model 2 months ago

lmms-lab/BAGEL-7B-MoT-ver.LE

Text Generation • 15B • Updated Dec 8, 2025 • 3.75k • 1

kcz358

updated a Space 2 months ago

README

kcz358

authored a paper 2 months ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 185

kcz358

updated a collection 2 months ago

LongVT

8 items • Updated Dec 11, 2025 • 8

mwxely

authored 2 papers 2 months ago

FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross-Entropy

Paper • 2305.10307 • Published May 17, 2023

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 185

russwang

authored a paper 3 months ago

Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

Paper • 2511.21662 • Published Nov 26, 2025 • 11

mwxely

authored a paper 3 months ago

TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding

Paper • 2508.01699 • Published Aug 3, 2025