ai-forever commited on
Commit
504e174
·
verified ·
1 Parent(s): 555d748

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/comfyui_kandinsky5.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/sbs/kandinsky_5_video_lite_10s_vs_kandinsky_5_video_lite_distill_10s.jpg filter=lfs diff=lfs merge=lfs -text
38
+ assets/sbs/kandinsky_5_video_lite_5s_vs_kandinsky_5_video_lite_distill_5s.jpg filter=lfs diff=lfs merge=lfs -text
39
+ assets/sbs/kandinsky_5_video_lite_vs_sora.jpg filter=lfs diff=lfs merge=lfs -text
40
+ assets/sbs/kandinsky_5_video_lite_vs_wan_2.1_1.3B.jpg filter=lfs diff=lfs merge=lfs -text
41
+ assets/sbs/kandinsky_5_video_lite_vs_wan_2.1_14B.jpg filter=lfs diff=lfs merge=lfs -text
42
+ assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_5B.jpg filter=lfs diff=lfs merge=lfs -text
43
+ assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_A14B.jpg filter=lfs diff=lfs merge=lfs -text
44
+ assets/vbench.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,230 +1,3 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
-
5
- ---
6
- license: apache-2.0
7
- ---
8
-
9
- <div align="center">
10
- <picture>
11
- <img src="assets/KANDINSKY_LOGO_1_BLACK.png">
12
- </picture>
13
- </div>
14
-
15
- <div align="center">
16
- <a href="https://habr.com/ru/companies/sberbank/articles/951800/">Habr</a> | <a href="https://ai-forever.github.io/Kandinsky-5/">Project Page</a> | Technical Report (soon) | <a href="https://github.com/ai-forever/Kandinsky-5">Original Github</a>
17
- </div>
18
-
19
- # Kandinsky 5.0 T2V Lite - Diffusers
20
-
21
- This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class.
22
-
23
- ## Project Updates
24
-
25
- - **2025/09/29**: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model.
26
- - **Diffusers Integration**: Now available with easy-to-use 🤗 Diffusers pipeline!
27
-
28
- ## Kandinsky 5.0 T2V Lite
29
-
30
- Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem.
31
-
32
- We provide 8 model variants, each optimized for different use cases:
33
-
34
- * **SFT model** — delivers the highest generation quality
35
- * **CFG-distilled** — runs 2× faster
36
- * **Diffusion-distilled** — enables low-latency generation with minimal quality loss (6× faster)
37
- * **Pretrain model** — designed for fine-tuning by researchers and enthusiasts
38
-
39
- All models are available in two versions: for generating 5-second and 10-second videos.
40
-
41
- ## Pipeline
42
-
43
- **Latent diffusion pipeline** with **Flow Matching**.
44
-
45
- **Diffusion Transformer (DiT)** as the main generative backbone with **cross-attention to text embeddings**.
46
-
47
- - **Qwen2.5-VL** and **CLIP** provides text embeddings
48
- - **HunyuanVideo 3D VAE** encodes/decodes video into a latent space
49
- - **DiT** is the main generative module using cross-attention to condition on text
50
-
51
- <div align="center">
52
- <img width="1600" height="477" alt="Pipeline Architecture" src="https://github.com/user-attachments/assets/17fc2eb5-05e3-4591-9ec6-0f6e1ca397b3" />
53
- </div>
54
-
55
- <div align="center">
56
- <img width="800" height="406" alt="Model Architecture" src="https://github.com/user-attachments/assets/f3006742-e261-4c39-b7dc-e39330be9a09" />
57
- </div>
58
-
59
- ## Basic Usage
60
-
61
- ```python
62
- import torch
63
- from diffusers import Kandinsky5T2VPipeline
64
- from diffusers.utils import export_to_video
65
-
66
- # Load the pipeline
67
- pipe = Kandinsky5T2VPipeline.from_pretrained(
68
- "ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s",
69
- torch_dtype=torch.float16
70
- )
71
- pipe = pipe.to("cuda")
72
-
73
- # Generate video
74
- prompt = "A cat and a dog baking a cake together in a kitchen."
75
- negative_prompt = "Bright tones, overexposed, static, blurred details"
76
-
77
- output = pipe(
78
- prompt=prompt,
79
- negative_prompt=negative_prompt,
80
- height=512,
81
- width=768,
82
- num_frames=25,
83
- num_inference_steps=50,
84
- guidance_scale=5.0,
85
- ).frames[0]
86
-
87
- # Save the video
88
- export_to_video(output, "output.mp4", fps=6)
89
- ```
90
-
91
- ## Using Different Model Variants
92
- ```python
93
- import torch
94
- from diffusers import Kandinsky5T2VPipeline
95
-
96
- # SFT 5s model (highest quality)
97
- pipe_sft = Kandinsky5T2VPipeline.from_pretrained(
98
- "ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s",
99
- torch_dtype=torch.float16
100
- )
101
-
102
- # Distilled 16-step model (fastest)
103
- pipe_distill = Kandinsky5T2VPipeline.from_pretrained(
104
- "ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s",
105
- torch_dtype=torch.float16
106
- )
107
-
108
- # No-CFG model (balanced speed/quality)
109
- pipe_nocfg = Kandinsky5T2VPipeline.from_pretrained(
110
- "ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s",
111
- torch_dtype=torch.float16
112
- )
113
-
114
- # Pretrain model (largest diversity, not aligned)
115
- pipe_pretrain = Kandinsky5T2VPipeline.from_pretrained(
116
- "ai-forever/Kandinsky-5.0-T2V-Lite-pretrain-5s",
117
- torch_dtype=torch.float16
118
- )
119
- ```
120
-
121
- ### Examples:
122
- #### Kandinsky 5.0 T2V Lite SFT
123
- <table border="0" style="width: 200; text-align: left; margin-top: 20px;">
124
- <tr>
125
- <td>
126
- <video src="https://github.com/user-attachments/assets/bc38821b-f9f1-46db-885f-1f70464669eb" width=200 controls autoplay loop></video>
127
- </td>
128
- <td>
129
- <video src="https://github.com/user-attachments/assets/9f64c940-4df8-4c51-bd81-a05de8e70fc3" width=200 controls autoplay loop></video>
130
- </td>
131
- <tr>
132
- <td>
133
- <video src="https://github.com/user-attachments/assets/77dd417f-e0bf-42bd-8d80-daffcd054add" width=200 controls autoplay loop></video>
134
- </td>
135
- <td>
136
- <video src="https://github.com/user-attachments/assets/385a0076-f01c-4663-aa46-6ce50352b9ed" width=200 controls autoplay loop></video>
137
- </td>
138
- <tr>
139
- <td>
140
- <video src="https://github.com/user-attachments/assets/7c1bcb31-cc7d-4385-9a33-2b0cc28393dd" width=200 controls autoplay loop></video>
141
- </td>
142
- <td>
143
- <video src="https://github.com/user-attachments/assets/990a8a0b-2df1-4bbc-b2e3-2859b6f1eea6" width=200 controls autoplay loop></video>
144
- </td>
145
- </tr>
146
- </table>
147
- #### Kandinsky 5.0 T2V Lite Distill
148
- <table border="0" style="width: 200; text-align: left; margin-top: 20px;">
149
- <tr>
150
- <td>
151
- <video src="https://github.com/user-attachments/assets/861342f9-f576-4083-8a3b-94570a970d58" width=200 controls autoplay loop></video>
152
- </td>
153
- <td>
154
- <video src="https://github.com/user-attachments/assets/302e4e7d-781d-4a58-9b10-8c473d469c4b" width=200 controls autoplay loop></video>
155
- </td>
156
- <tr>
157
- <td>
158
- <video src="https://github.com/user-attachments/assets/3e70175c-40e5-4aec-b506-38006fe91a76" width=200 controls autoplay loop></video>
159
- </td>
160
- <td>
161
- <video src="https://github.com/user-attachments/assets/b7da85f7-8b62-4d46-9460-7f0e505de810" width=200 controls autoplay loop></video>
162
- </td>
163
- </table>
164
- ### Results:
165
- #### Side-by-Side evaluation
166
- The evaluation is based on the expanded prompts from the [Movie Gen benchmark](https://github.com/facebookresearch/MovieGenBench), which are available in the expanded_prompt column of the benchmark/moviegen_bench.csv file.
167
- <table border="0" style="width: 400; text-align: left; margin-top: 20px;">
168
- <tr>
169
- <td>
170
- <img src="assets/sbs/kandinsky_5_video_lite_vs_sora.jpg" width=400 ></img>
171
- </td>
172
- <td>
173
- <img src="assets/sbs/kandinsky_5_video_lite_vs_wan_2.1_14B.jpg" width=400 ></img>
174
- </td>
175
- <tr>
176
- <td>
177
- <img src="assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_5B.jpg" width=400 ></img>
178
- </td>
179
- <td>
180
- <img src="assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_A14B.jpg" width=400 ></img>
181
- </td>
182
- <tr>
183
- <td>
184
- <img src="assets/sbs/kandinsky_5_video_lite_vs_wan_2.1_1.3B.jpg" width=400 ></img>
185
- </td>
186
- </table>
187
- #### Distill Side-by-Side evaluation
188
- <table border="0" style="width: 400; text-align: left; margin-top: 20px;">
189
- <tr>
190
- <td>
191
- <img src="assets/sbs/kandinsky_5_video_lite_5s_vs_kandinsky_5_video_lite_distill_5s.jpg" width=400 ></img>
192
- </td>
193
- <td>
194
- <img src="assets/sbs/kandinsky_5_video_lite_10s_vs_kandinsky_5_video_lite_distill_10s.jpg" width=400 ></img>
195
- </td>
196
- </table>
197
- #### VBench results
198
- <div align="center">
199
- <picture>
200
- <img src="assets/vbench.png">
201
- </picture>
202
- </div>
203
-
204
-
205
- # Citation
206
- ```
207
- @misc{kandinsky2025,
208
- author = {Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov,
209
- Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim,
210
- Anastasiia Kargapoltseva, Nikita Kiselev, Vladimir Arkhipkin, Vladimir Korviakov,
211
- Nikolai Gerasimenko, Denis Parkhomenko, Anna Dmitrienko, Anastasia Maltseva,
212
- Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov,
213
- Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina,
214
- Tatiana Nikulina, Polina Gavrilova, Denis Dimitrov},
215
- title = {Kandinsky 5.0: A family of diffusion models for Video & Image generation},
216
- howpublished = {\url{https://github.com/ai-forever/Kandinsky-5}},
217
- year = 2025
218
- }
219
- @misc{mikhailov2025nablanablaneighborhoodadaptiveblocklevel,
220
- title={$\nabla$NABLA: Neighborhood Adaptive Block-Level Attention},
221
- author={Dmitrii Mikhailov and Aleksey Letunovskiy and Maria Kovaleva and Vladimir Arkhipkin
222
- and Vladimir Korviakov and Vladimir Polovnikov and Viacheslav Vasilev
223
- and Evelina Sidorova and Denis Dimitrov},
224
- year={2025},
225
- eprint={2507.13546},
226
- archivePrefix={arXiv},
227
- primaryClass={cs.CV},
228
- url={https://arxiv.org/abs/2507.13546},
229
- }
230
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
assets/KANDINSKY_LOGO_1_BLACK.png ADDED
assets/KANDINSKY_LOGO_1_WHITE.png ADDED
assets/comfyui_kandinsky5.png ADDED

Git LFS Details

  • SHA256: 4c91961abe51a1fcbd3a35d438ea3b4f652f61a4d9f035c9f10e91dc5c9b79cd
  • Pointer size: 131 Bytes
  • Size of remote file: 474 kB
assets/sbs/kandinsky_5_video_lite_10s_vs_kandinsky_5_video_lite_distill_10s.jpg ADDED

Git LFS Details

  • SHA256: a6f53623b3c1e1f45ea6872f3afa4b3f71d79377bc89065b12e590c8a1a60f1d
  • Pointer size: 131 Bytes
  • Size of remote file: 190 kB
assets/sbs/kandinsky_5_video_lite_5s_vs_kandinsky_5_video_lite_distill_5s.jpg ADDED

Git LFS Details

  • SHA256: 81d9aa99a224f3b1ce7262edf0c969bebcb7b95349cb5b57be5cc7aecbcc15d9
  • Pointer size: 131 Bytes
  • Size of remote file: 192 kB
assets/sbs/kandinsky_5_video_lite_vs_sora.jpg ADDED

Git LFS Details

  • SHA256: 2a5c838cb53a026a57d3037361ad4ed74bae4b31f4d1b11e6474956eca42d412
  • Pointer size: 131 Bytes
  • Size of remote file: 195 kB
assets/sbs/kandinsky_5_video_lite_vs_wan_2.1_1.3B.jpg ADDED

Git LFS Details

  • SHA256: 74fa68588e7e24fd817cc8e96d63f4e5b623ff193c71a644c0ce42ebb9b49dac
  • Pointer size: 131 Bytes
  • Size of remote file: 170 kB
assets/sbs/kandinsky_5_video_lite_vs_wan_2.1_14B.jpg ADDED

Git LFS Details

  • SHA256: 80bc261b9afcaf1446228a24a96afe3b5c24b4780f3e2f43e27496077611ec6f
  • Pointer size: 131 Bytes
  • Size of remote file: 196 kB
assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_5B.jpg ADDED

Git LFS Details

  • SHA256: d01f4a73b287541487228939fd505a947b78b6325f76421b2ee5f1523188e08e
  • Pointer size: 131 Bytes
  • Size of remote file: 192 kB
assets/sbs/kandinsky_5_video_lite_vs_wan_2.2_A14B.jpg ADDED

Git LFS Details

  • SHA256: 4f053f7d996112f40e8b49f6440ea75a40f71c02e60d467cff479ced0b54444a
  • Pointer size: 131 Bytes
  • Size of remote file: 198 kB
assets/vbench.png ADDED

Git LFS Details

  • SHA256: 27131bac1ccb83d3d28e8f558c6a7a91ed92816c0814583299b8584f0cda6546
  • Pointer size: 131 Bytes
  • Size of remote file: 170 kB