|
|
--- |
|
|
|
|
|
|
|
|
{} |
|
|
--- |
|
|
|
|
|
# Model Card for Initial Noise Loader for Stable Diffusion XL |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
This custom pipeline contains an initial noise loader class (class `NoiseLoaderMixin` inspired from LoRA / textual inversion loaders in the diffusers library) for Stable Diffusion XL architecture. The initial noise loader allows to change the distribution of initial noise the generation process starts from with a single line of code “custom_pipeline.load_initial_noise_modifier(…)”. |
|
|
Currently implemented methods: |
|
|
|
|
|
- Start generation from a fixed noise. |
|
|
Example: `custom_pipeline.load_initial_noise_modifier(method="fixed-seed", seed=…)` |
|
|
- Golden Noise for Diffusion Models: A Learning Framework (Zhou et al., https://arxiv.org/abs/2411.09502). |
|
|
Example: `custom_pipeline.load_initial_noise_modifier(method="golden-noise", npnet_path=…)` |
|
|
- General Normal Distribution: Sample from a user defined General Normal Distribution |
|
|
Example: `custom_pipeline.load_initial_noise_modifier(method="general-normal-distribution", init_noise_mean=(0, -0.1, 0.2, 0), init_noise_std=(1, 1, 1, 1)])` |
|
|
|
|
|
|
|
|
|
|
|
Demo Notebook: [](https://colab.research.google.com/drive/1-owYN8r2TbT-Je_eTEpnIMLj1nvxPYqI#scrollTo=HQS6OQ44jz66) |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find my code useful, you may cite: |
|
|
``` |
|
|
@misc{initial_noise, |
|
|
author = {Syrine Noamen}, |
|
|
title = { Initial Noise Loader for Stable Diffusion XL - HuggingFace}, |
|
|
|
|
|
year = 2025, |
|
|
publisher = { HugginFace }, |
|
|
journal = { Hugging Face repository}, |
|
|
howpublished = {\url{https://huggingface.co/syrinenoamen/stable-diffusion_xl_initial_noise_loader}}, |
|
|
} |
|
|
``` |
|
|
## Example 1: Start generation from a fixed noise |
|
|
|
|
|
This example is mostly for demonstration as this can already be achieved easily in the diffusers library. |
|
|
|
|
|
### Uses |
|
|
|
|
|
``` |
|
|
from diffusers import DiffusionPipeline |
|
|
custom_pipeline = DiffusionPipeline.from_pretrained( |
|
|
"stabilityai/stable-diffusion-xl-base-1.0", |
|
|
variant="fp16", |
|
|
torch_dtype=torch.float16, |
|
|
use_safetensors=True, |
|
|
custom_pipeline="syrinenoamen/stable-diffusion_xl_initial_noise_loader" |
|
|
).to(device) |
|
|
custom_pipeline.load_initial_noise_modifier(method="fixed-seed", seed=12345) |
|
|
``` |
|
|
|
|
|
 |
|
|
## Example 2: Golden Noise for Diffusion Models: A Learning Framework (Zhou et al., https://arxiv.org/abs/2411.09502) |
|
|
|
|
|
## Requirements |
|
|
|
|
|
|
|
|
```pip install timm einops``` |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
|
|
``` |
|
|
from diffusers import DiffusionPipeline |
|
|
custom_pipeline = DiffusionPipeline.from_pretrained( |
|
|
"stabilityai/stable-diffusion-xl-base-1.0", |
|
|
variant="fp16", |
|
|
torch_dtype=torch.float16, |
|
|
use_safetensors=True, |
|
|
custom_pipeline="syrinenoamen/initial_noise_loader" |
|
|
).to(device) |
|
|
``` |
|
|
|
|
|
 |
|
|
|
|
|
## Citation Golden Noise |
|
|
Code adapted from [Github Repo](https://github.com/xie-lab-ml/Golden-Noise-for-Diffusion-Models) |
|
|
|
|
|
``` |
|
|
@misc{zhou2024goldennoisediffusionmodels, |
|
|
title={Golden Noise for Diffusion Models: A Learning Framework}, |
|
|
author={Zikai Zhou and Shitong Shao and Lichen Bai and Zhiqiang Xu and Bo Han and Zeke Xie}, |
|
|
year={2024}, |
|
|
eprint={2411.09502}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.LG}, |
|
|
url={https://arxiv.org/abs/2411.09502}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Example 3: General Normal Distribution |
|
|
|
|
|
The latent space of SDXL is a 4-channel tensor with interpretable semantics. Channel 1 primarily encodes luminance or overall brightness, while Channel 2 captures the cyan–red color axis, and Channel 3 represents the green–blue axis. Channel 4 encodes structure and patterns. |
|
|
|
|
|
By manipulating the mean values of these channels, particularly those associated with color, you can bias the generation process toward specific visual tones or styles. This allows for a degree of control over the image's color palette directly in the latent space, without modifying the text prompt or conditioning vectors. |
|
|
<div style="display: flex; justify-content: space-between; align-items: center;"> |
|
|
<div style="text-align: center; flex: 1; margin-right: 10px;"> |
|
|
<img src="examples/blue_mountain.png" alt="Blue, purple tone" style="width:100%;"> |
|
|
<p><em>(a) Biased toward blue and purple tones</em></p> |
|
|
</div> |
|
|
<div style="text-align: center; flex: 1; margin-left: 10px;"> |
|
|
<img src="examples/orange_mountain.png" alt="Red, orange tone" style="width:100%;"> |
|
|
<p><em>(b) Biased toward red and orange tones</em></p> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
<p style="text-align: center;"><strong>Figure:</strong> Controlling the latent space color distribution</p> |
|
|
|