Papers
arxiv:2510.24718

Generative View Stitching

Published on Oct 28
· Submitted by Chonghyuk Song on Oct 30
Authors:
,
,
,

Abstract

Generative View Stitching (GVS) enables stable, collision-free, and temporally consistent camera-guided video generation by sampling sequences in parallel and conditioning on both past and future frames.

AI-generated summary

Autoregressive video diffusion models are capable of long rollouts that are stable and consistent with history, but they are unable to guide the current generation with conditioning from the future. In camera-guided video generation with a predefined camera trajectory, this limitation leads to collisions with the generated scene, after which autoregression quickly collapses. To address this, we propose Generative View Stitching (GVS), which samples the entire sequence in parallel such that the generated scene is faithful to every part of the predefined camera trajectory. Our main contribution is a sampling algorithm that extends prior work on diffusion stitching for robot planning to video generation. While such stitching methods usually require a specially trained model, GVS is compatible with any off-the-shelf video model trained with Diffusion Forcing, a prevalent sequence diffusion framework that we show already provides the affordances necessary for stitching. We then introduce Omni Guidance, a technique that enhances the temporal consistency in stitching by conditioning on both the past and future, and that enables our proposed loop-closing mechanism for delivering long-range coherence. Overall, GVS achieves camera-guided video generation that is stable, collision-free, frame-to-frame consistent, and closes loops for a variety of predefined camera paths, including Oscar Reutersv\"ard's Impossible Staircase. Results are best viewed as videos at https://andrewsonga.github.io/gvs.

Community

Paper author Paper submitter
This comment has been hidden (marked as Resolved)
Paper author Paper submitter
edited about 2 hours ago

TL;DR: Generative View Stitching is a non-autoregressive sampling method for video length extrapolation that enables collision-free camera-guided video diffusion for predefined trajectories, including Oscar Reutersvärd's Impossible Staircase:

Project Page: https://andrewsonga.github.io/gvs/
Code: https://github.com/andrewsonga/generative_view_stitching

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.24718 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.24718 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.