--- license: apache-2.0 tags: - Image-to-Video - image-to-video - image - video - wan2.2 ---
Generated frames are limited to 21 (1.3 sec 1280*704) to fit within 8 GB VRAM.
For full 5 second video max resolution: 720*405 (on 8Gb VRAM)
* tested on HELIOS PREDATOR 300 laptop (3070Ti 8GB) 1280*704 - 72.22 s/it for 21 frames, 60.74 s/it for 17 frames, 13 frames 48.70 s/it - **Optimized I2V-A14B** run long video generation loop with **loop.bat** https://github.com/user-attachments/assets/154df173-88d3-4ad1-b543-f7410380b13a ## How it works - edit prompt in **loop.bat** and run (command runs in loop, each iteration do one spep: create latent from image -> y_latents.pt, run inference -> final_latents.pt, decode video final_latents.pt -> last_frame_latents.pt, create latent from last frame last_frame_latents.pt -> y_latents.pt, run inference ...) - **to start new generation loop** with new image / prompt / frame count / size - delete: **y_latents.pt**, **final_latents.pt**, **last_frame_latents.pt** ## Results on a 3070 Ti laptop GPU with 8 GB VRAM: # size 640*352 # 81 frames 58.23 s/it 51.32 s/it (*FP8) # 33 frames 23.75 s/it vae decode 4.5 sec # 704 * 396, sampling_steps 25+ # frame_num = 49 24.72 s/it (FP16) # frame_num = 81 77.50 s/it (FP16) # size 720*405, sampling_steps 20+ # frame_num = 17 21.23 s/it (FP16) # frame_num = 77 82.11 s/it (FP16) # frame_num = 81 (best) 70.74 s/it (*FP8) vae decode 12.2 sec # size 832*464 / 848*448, sampling_steps 20+ # frame_num = 17 23.68 s/it vae decode 3.54 sec # frame_num = 53 74.34 s/it # 65 79.73 s/it # size 960*540, sampling_steps 16+ # 17 frames 34.30 s/it (FP16) # 41 frames 75.02 s/it (FP16) # 45 frames 72.35 s/it (*FP8) vae decode 11.7 sec ###################################################### # for 8gb vram and sizes > 960*540 vae use slow shared video memory # size = 1120 * 630 # 13 frames 29.24 s/it (*FP8) vae decode 18 sec # 33 frames (max) 85.10 s/it (FP16) vae decode 57.93sec (10.1Gb) # 33 frames 76.49 s/it (*FP8) # 37 85.16 s/it (*FP8) vae decode 75.73sec # size 1280*720, sampling_steps 16+ # 13 frames 48.70 s/it (FP16) vae decode 99 sec # 13 frames 39.61 s/it (*FP8) # 17 frames 60.74 s/it (FP16) # 17 frames 54.02 s/it (*FP8) # 21 frames (max) 72.22 s/it (FP16) # 21 frames 66.18 s/it (*FP8) vae decode 152 sec # size 1600*896 / 1568*896, sampling_steps 15+ # 13 frames (max) 85.47 s/it (FP16) # 13 frames (max) 63.88 s/it (*FP8) vae decode 284 sec # Compared to ComfyUA ComfyUA (fp8) This (fp16) optimized vae 1120*630 33 frames * 16 steps 1470 sec 85.10 s/it * 16 = 1362 sec vae decode +117 sec +58 sec total 1587 sec 1420 sec 1.12x faster This (*fp8) optimized vae 76.49 s/it * 16 = 1224 sec +58 sec 1282 sec 1.24x faster !! *fp8 - 3070 Ti doesn`t support calculations in fp8, loaded weights in fp8 converting for calculations to fp16 "on the fly" Visualy hard to notice diference in quality between fp8 and fp16.. To try FP16 - download original model and use convert_files / optimize_files and put safetensors in ./Wan2.2-I2V-A14B/(low_noise_model/high_noise_model) and swith to false "self.load_as_fp8 = True" - in image2videolocal.py more details: [https://github.com/nalexand/Wan2.2](https://github.com/nalexand/Wan2.2)