Shengqiong Wu's picture

Shengqiong Wu

ChocoWu

·

https://chocowu.github.io/

ChocoWu

AI & ML interests

Large Language Model, Multimodal learning, Scene graph Generation

Organizations

upvoted 3 papers 4 months ago

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 94

Kling-Omni Technical Report

Paper • 2512.16776 • Published Dec 18, 2025 • 173

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Paper • 2512.11749 • Published Dec 12, 2025 • 39

upvoted a paper 5 months ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39

upvoted 5 papers 6 months ago

Latent Diffusion Model without Variational Autoencoder

Paper • 2510.15301 • Published Oct 17, 2025 • 50

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Paper • 2510.13940 • Published Oct 15, 2025 • 7

AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes

Paper • 2510.10670 • Published Oct 12, 2025 • 19

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Paper • 2510.08555 • Published Oct 9, 2025 • 65

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

Paper • 2510.08143 • Published Oct 9, 2025 • 20

upvoted 2 papers 11 months ago

3D Scene Generation: A Survey

Paper • 2505.05474 • Published May 8, 2025 • 21

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 83

upvoted a paper 12 months ago

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17, 2025 • 20

upvoted 5 papers about 1 year ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 305

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Paper • 2503.23377 • Published Mar 30, 2025 • 57

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31, 2025 • 76

Position: Interactive Generative Video as Next-Generation Game Engine

Paper • 2503.17359 • Published Mar 21, 2025 • 61

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16, 2025 • 35

upvoted a paper almost 2 years ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54