metadata
license: apache-2.0
UNet with Sliding Window Attention
- 8ch latent by moving modules from WF-VAE to NoobAI XL VAE
- supports recent long context CLIPs
- variable num_head in MHA across the layers
- both the UNet and the Autoencoder are written in vanilla PyTorch
The result is similar to what Mitsua accomplished back then.
References
- 2411.17459