Papers
arxiv:2510.20766

DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Published on Oct 23
· Submitted by Guy Yariv on Oct 24
Authors:
,
,
,

Abstract

Dynamic Position Extrapolation (DyPE) enhances ultra-high-resolution image generation by dynamically adjusting positional encodings in pre-trained diffusion transformers, achieving state-of-the-art fidelity without additional sampling cost.

AI-generated summary

Diffusion Transformer models can generate images with remarkable fidelity and detail, yet training them at ultra-high resolutions remains extremely costly due to the self-attention mechanism's quadratic scaling with the number of image tokens. In this paper, we introduce Dynamic Position Extrapolation (DyPE), a novel, training-free method that enables pre-trained diffusion transformers to synthesize images at resolutions far beyond their training data, with no additional sampling cost. DyPE takes advantage of the spectral progression inherent to the diffusion process, where low-frequency structures converge early, while high-frequencies take more steps to resolve. Specifically, DyPE dynamically adjusts the model's positional encoding at each diffusion step, matching their frequency spectrum with the current stage of the generative process. This approach allows us to generate images at resolutions that exceed the training resolution dramatically, e.g., 16 million pixels using FLUX. On multiple benchmarks, DyPE consistently improves performance and achieves state-of-the-art fidelity in ultra-high-resolution image generation, with gains becoming even more pronounced at higher resolutions. Project page is available at https://noamissachar.github.io/DyPE/.

Community

Paper author Paper submitter

DyPE (Dynamic Position Extrapolation) enables pre-trained diffusion transformers to generate ultra-high-resolution images far beyond their training scale. It dynamically adjusts positional encodings during denoising to match evolving frequency content—achieving faithful 4K × 4K results without retraining or extra sampling cost.

Project page: https://noamissachar.github.io/DyPE/
Code: https://github.com/guyyariv/DyPE

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Congrats! I've been looking for this. High levels of detail in these images give them a artistic aesthetic, the kind that invites people to glanse a second time with a will of discovery and exploration. Thanks for helping us making textures and depth to pull the viewer in.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.20766 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.20766 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.20766 in a Space README.md to link it from this page.

Collections including this paper 1