Elucidating the design space of language models for image generation
Paper
•
2410.16257
•
Published
This is the model release of the paper
You may check the paper: arXiv, code: Github
We provide 4 Binary-Autoencoder (BAE) tokenizers, following Binary Latent Diffusion, with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.
| Code Dim | Bernoulli Sampling | Link | Size |
|---|---|---|---|
| 16 | ✅ | link | 332MB |
| 16 | ❌ | link | 332MB |
| 20 | ✅ | link | 332MB |
| 24 | ✅ | link | 332MB |
The generation model architecture is adapted from Llama2, following LlameGen.
Unable to build the model tree, the base model loops to the model itself. Learn more.