-
Tracking Anything with Decoupled Video Segmentation
Paper • 2309.03903 • Published • 28 -
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Paper • 2312.16457 • Published • 15 -
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Paper • 2312.15770 • Published • 15
william cody stanford
williamcstanford
AI & ML interests
None yet
Organizations
None yet
RL
LLMs
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Perspectives on the State and Future of Deep Learning - 2023
Paper • 2312.09323 • Published • 8 -
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 -
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Paper • 2407.10718 • Published • 19
Autonomous agents
-
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Paper • 2401.13919 • Published • 32 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 97 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
Music gen
brain
relighting
Depth Estimation
Code Understanding
diffusion
-
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 50 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 43 -
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Paper • 2311.04145 • Published • 35
robotics
video gen
-
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper • 2401.04468 • Published • 49 -
Anything in Any Scene: Photorealistic Video Object Insertion
Paper • 2401.17509 • Published • 17 -
Memory Consolidation Enables Long-Context Video Understanding
Paper • 2402.05861 • Published • 10 -
Magic-Me: Identity-Specific Video Customized Diffusion
Paper • 2402.09368 • Published • 30
Transformer improvements
video understanding
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
Sora Generates Videos with Stunning Geometrical Consistency
Paper • 2402.17403 • Published • 18 -
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Paper • 2406.16338 • Published • 26
MUST FOLLOWS
-
Explorative Inbetweening of Time and Space
Paper • 2403.14611 • Published • 13 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 -
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Paper • 2402.11929 • Published • 11 -
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Paper • 2403.14773 • Published • 11
singing portraits
Cellular Automata DL
datasets
video segmentation
-
Tracking Anything with Decoupled Video Segmentation
Paper • 2309.03903 • Published • 28 -
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Paper • 2312.16457 • Published • 15 -
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Paper • 2312.15770 • Published • 15
diffusion
-
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 50 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 43 -
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Paper • 2311.04145 • Published • 35
RL
robotics
LLMs
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Perspectives on the State and Future of Deep Learning - 2023
Paper • 2312.09323 • Published • 8 -
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 -
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Paper • 2407.10718 • Published • 19
video gen
-
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper • 2401.04468 • Published • 49 -
Anything in Any Scene: Photorealistic Video Object Insertion
Paper • 2401.17509 • Published • 17 -
Memory Consolidation Enables Long-Context Video Understanding
Paper • 2402.05861 • Published • 10 -
Magic-Me: Identity-Specific Video Customized Diffusion
Paper • 2402.09368 • Published • 30
Autonomous agents
-
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Paper • 2401.13919 • Published • 32 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 97 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
Transformer improvements
Music gen
video understanding
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
Sora Generates Videos with Stunning Geometrical Consistency
Paper • 2402.17403 • Published • 18 -
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Paper • 2406.16338 • Published • 26
brain
MUST FOLLOWS
-
Explorative Inbetweening of Time and Space
Paper • 2403.14611 • Published • 13 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 -
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Paper • 2402.11929 • Published • 11 -
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Paper • 2403.14773 • Published • 11
relighting
singing portraits
Depth Estimation
Cellular Automata DL
Code Understanding
datasets