Guanzhou Ke's picture

Guanzhou Ke

guanzhouk

·

Guanzhou-Ke

AI & ML interests

Multi-modal learning

Recent Activity

upvoted a paper 6 days ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

upvoted a paper 7 days ago

Glyph: Scaling Context Windows via Visual-Text Compression

upvoted a paper 8 days ago

FineVision: Open Data Is All You Need

View all activity

Organizations

None yet

upvoted a paper 6 days ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published 15 days ago • 16

upvoted a paper 7 days ago

Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published 9 days ago • 61

upvoted a paper 8 days ago

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published 10 days ago • 59

upvoted 2 papers 18 days ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published 20 days ago • 252

Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published 20 days ago • 43

upvoted 2 papers 21 days ago

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them

Paper • 2509.21117 • Published Sep 25 • 29

Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25 • 88

upvoted 3 papers about 1 month ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 96

Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 133

Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12 • 26

upvoted 4 papers about 2 months ago

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 82

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8 • 40

Interleaving Reasoning for Better Text-to-Image Generation

Paper • 2509.06945 • Published Sep 8 • 14

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

upvoted 6 papers 2 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 201

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Paper • 2508.18032 • Published Aug 25 • 41

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 127

Ovis2.5 Technical Report

Paper • 2508.11737 • Published Aug 15 • 109

DINOv3

Paper • 2508.10104 • Published Aug 13 • 274

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

Paper • 2508.10711 • Published Aug 14 • 142