AI & ML interests
Natural Language Processing, Image Generation
Recent Activity
View all activity
Papers
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Stress Testing Image Generation Models
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Crique-Coder
MMLU-Pro
Reasoning in the pixel space
Advancing LLMs' general reasoning capabilities
Video Mamba
A collection of models and datasets from ABC: Achieving Better Control of Multimodal Embeddings using VLMs.
PixelWorld
The dataset and models for CritiqueFineTuning
-
TIGER-Lab/WebInstruct-CFT
Viewer • Updated • 654k • 128 • 56 -
TIGER-Lab/Qwen2.5-Math-7B-CFT
Text Generation • 8B • Updated • 55 • 8 -
TIGER-Lab/Qwen2.5-32B-Instruct-CFT
Text Generation • 33B • Updated • 7 • 6 -
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Paper • 2501.17703 • Published • 58
The generalist image editing model
The VLM2Vec embedding models.
The datasets and models for the MAmmoTH project
Imagenhub
The structure knowledge grounded language model
Mantis model family optimized for multi-image reasoning with interleaved text/image format
Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Browseragent: An agent that can interact with browser to complete tasks
VideoScore2
web explorer model
Converting VLM to a general embedding model
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
The pioneering work in Dialogue-driven Movie Shot Generation
SoTA VLM for Reasoning
-
TIGER-Lab/VL-Rethinker-72B
Visual Question Answering • 73B • Updated • 76 • 5 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
TIGER-Lab/VL-Rethinker-7B
Image-Text-to-Text • 8B • Updated • 1.23k • 13 -
TIGER-Lab/VL-Reasoner-72B
Visual Question Answering • 73B • Updated • 4 • 3
Scaling up MM data
-
TIGER-Lab/VisualWebInstruct-Recall
Viewer • Updated • 361k • 371 • 4 -
TIGER-Lab/VisualWebInstruct-Seed
Viewer • Updated • 60.3k • 182 • 18 -
TIGER-Lab/VisualWebInstruct
Viewer • Updated • 1.91M • 1.2k • 38 -
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Paper • 2503.10582 • Published • 24
The dataset and model for MAmmoTH-VL
Video Augmentation for Synthetic Video Instruction-following Data Generation
-
TIGER-Lab/VISTA-LongVA
Video-Text-to-Text • 8B • Updated • 1 • 2 -
TIGER-Lab/VISTA-Mantis
Video-Text-to-Text • 8B • Updated -
TIGER-Lab/VISTA-VideoLLaVA
Video-Text-to-Text • 7B • Updated • 1 -
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Paper • 2412.00927 • Published • 29
List of model variates of TIGEREScore checkpoints and the associated dataset
-
TIGER-Lab/TIGERScore-13B
Text Generation • 13B • Updated • 420 • 18 -
TIGER-Lab/MetricInstruct
Viewer • Updated • 42.5k • 83 • 13 -
TIGER-Lab/TIGERScore-7B
Text Generation • 7B • Updated • 26 • 2 -
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Paper • 2310.00752 • Published • 3
The dataset and model for UniIR project
ConsistI2V Image-to-Video generation models
Scaling up instruction data from the web for to build better LLMs
Long-context research projects
Stress Testing Image Generation Models
Browseragent: An agent that can interact with browser to complete tasks
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
VideoScore2
Crique-Coder
web explorer model
MMLU-Pro
Converting VLM to a general embedding model
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
Reasoning in the pixel space
The pioneering work in Dialogue-driven Movie Shot Generation
Advancing LLMs' general reasoning capabilities
SoTA VLM for Reasoning
-
TIGER-Lab/VL-Rethinker-72B
Visual Question Answering • 73B • Updated • 76 • 5 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
TIGER-Lab/VL-Rethinker-7B
Image-Text-to-Text • 8B • Updated • 1.23k • 13 -
TIGER-Lab/VL-Reasoner-72B
Visual Question Answering • 73B • Updated • 4 • 3
Video Mamba
A collection of models and datasets from ABC: Achieving Better Control of Multimodal Embeddings using VLMs.
Scaling up MM data
-
TIGER-Lab/VisualWebInstruct-Recall
Viewer • Updated • 361k • 371 • 4 -
TIGER-Lab/VisualWebInstruct-Seed
Viewer • Updated • 60.3k • 182 • 18 -
TIGER-Lab/VisualWebInstruct
Viewer • Updated • 1.91M • 1.2k • 38 -
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Paper • 2503.10582 • Published • 24
PixelWorld
The dataset and models for CritiqueFineTuning
-
TIGER-Lab/WebInstruct-CFT
Viewer • Updated • 654k • 128 • 56 -
TIGER-Lab/Qwen2.5-Math-7B-CFT
Text Generation • 8B • Updated • 55 • 8 -
TIGER-Lab/Qwen2.5-32B-Instruct-CFT
Text Generation • 33B • Updated • 7 • 6 -
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Paper • 2501.17703 • Published • 58
The dataset and model for MAmmoTH-VL
Video Augmentation for Synthetic Video Instruction-following Data Generation
-
TIGER-Lab/VISTA-LongVA
Video-Text-to-Text • 8B • Updated • 1 • 2 -
TIGER-Lab/VISTA-Mantis
Video-Text-to-Text • 8B • Updated -
TIGER-Lab/VISTA-VideoLLaVA
Video-Text-to-Text • 7B • Updated • 1 -
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Paper • 2412.00927 • Published • 29
The generalist image editing model
The VLM2Vec embedding models.
List of model variates of TIGEREScore checkpoints and the associated dataset
-
TIGER-Lab/TIGERScore-13B
Text Generation • 13B • Updated • 420 • 18 -
TIGER-Lab/MetricInstruct
Viewer • Updated • 42.5k • 83 • 13 -
TIGER-Lab/TIGERScore-7B
Text Generation • 7B • Updated • 26 • 2 -
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Paper • 2310.00752 • Published • 3
The datasets and models for the MAmmoTH project
The dataset and model for UniIR project
Imagenhub
The structure knowledge grounded language model
ConsistI2V Image-to-Video generation models
Mantis model family optimized for multi-image reasoning with interleaved text/image format
Scaling up instruction data from the web for to build better LLMs
Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Long-context research projects