Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

william cody stanford's picture

william cody stanford

williamcstanford

victor's profile picture

21world's profile picture

·

AI & ML interests

None yet

Organizations

None yet

williamcstanford 's collections 18

video segmentation

Tracking Anything with Decoupled Video Segmentation

Paper • 2309.03903 • Published Sep 7, 2023 • 28
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Paper • 2312.16457 • Published Dec 27, 2023 • 15
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Paper • 2312.15770 • Published Dec 25, 2023 • 15

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 48
Perspectives on the State and Future of Deep Learning - 2023

Paper • 2312.09323 • Published Dec 7, 2023 • 8
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published May 23, 2024 • 41
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

Paper • 2407.10718 • Published Jul 15, 2024 • 19

Autonomous agents

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Paper • 2401.13919 • Published Jan 25, 2024 • 32
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 97
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 71

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

Paper • 2402.06178 • Published Feb 9, 2024 • 15

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

Paper • 2403.11207 • Published Mar 17, 2024 • 15

DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Paper • 2402.11929 • Published Feb 19, 2024 • 11

Depth Estimation

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51

Code Understanding

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 65

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 50
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 36
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 43
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Paper • 2311.04145 • Published Nov 7, 2023 • 35

Foundation Models in Robotics: Applications, Challenges, and the Future

Paper • 2312.07843 • Published Dec 13, 2023 • 18
3D Diffusion Policy

Paper • 2403.03954 • Published Mar 6, 2024 • 14

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Paper • 2401.04468 • Published Jan 9, 2024 • 49
Anything in Any Scene: Photorealistic Video Object Insertion

Paper • 2401.17509 • Published Jan 30, 2024 • 17
Memory Consolidation Enables Long-Context Video Understanding

Paper • 2402.05861 • Published Feb 8, 2024 • 10
Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14, 2024 • 30

Transformer improvements

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13

video understanding

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27, 2024 • 18
Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 21
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 26

Explorative Inbetweening of Time and Space

Paper • 2403.14611 • Published Mar 21, 2024 • 13
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 29
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Paper • 2402.11929 • Published Feb 19, 2024 • 11
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Paper • 2403.14773 • Published Mar 21, 2024 • 11

singing portraits

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Paper • 2403.17694 • Published Mar 26, 2024 • 12

Cellular Automata DL

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Paper • 2407.09413 • Published Jul 12, 2024 • 11

video segmentation

Tracking Anything with Decoupled Video Segmentation

Paper • 2309.03903 • Published Sep 7, 2023 • 28
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Paper • 2312.16457 • Published Dec 27, 2023 • 15
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Paper • 2312.15770 • Published Dec 25, 2023 • 15

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 50
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 36
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 43
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Paper • 2311.04145 • Published Nov 7, 2023 • 35

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15

Foundation Models in Robotics: Applications, Challenges, and the Future

Paper • 2312.07843 • Published Dec 13, 2023 • 18
3D Diffusion Policy

Paper • 2403.03954 • Published Mar 6, 2024 • 14

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 48
Perspectives on the State and Future of Deep Learning - 2023

Paper • 2312.09323 • Published Dec 7, 2023 • 8
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published May 23, 2024 • 41
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

Paper • 2407.10718 • Published Jul 15, 2024 • 19

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Paper • 2401.04468 • Published Jan 9, 2024 • 49
Anything in Any Scene: Photorealistic Video Object Insertion

Paper • 2401.17509 • Published Jan 30, 2024 • 17
Memory Consolidation Enables Long-Context Video Understanding

Paper • 2402.05861 • Published Feb 8, 2024 • 10
Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14, 2024 • 30

Autonomous agents

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Paper • 2401.13919 • Published Jan 25, 2024 • 32
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 97
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 71

Transformer improvements

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

Paper • 2402.06178 • Published Feb 9, 2024 • 15

video understanding

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27, 2024 • 18
Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 21
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 26

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

Paper • 2403.11207 • Published Mar 17, 2024 • 15

Explorative Inbetweening of Time and Space

Paper • 2403.14611 • Published Mar 21, 2024 • 13
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 29
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Paper • 2402.11929 • Published Feb 19, 2024 • 11
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Paper • 2403.14773 • Published Mar 21, 2024 • 11

DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Paper • 2402.11929 • Published Feb 19, 2024 • 11

singing portraits

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Paper • 2403.17694 • Published Mar 26, 2024 • 12

Depth Estimation

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51

Cellular Automata DL

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51

Code Understanding

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 65

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Paper • 2407.09413 • Published Jul 12, 2024 • 11

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs