Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ library_name: diffusers
|
|
| 19 |
|
| 20 |
# Key Feature
|
| 21 |
|
| 22 |
-
- **Open Source**: Full [model weights](https://huggingface.co/rhymes-ai/Allegro) and [code](https://github.com/rhymes-ai/Allegro) available to the community, Apache 2.0!
|
| 23 |
- **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
|
| 24 |
- **Text-Image-to-Video Generation**: Generate videos from user-provided prompts and images. Supported input types include:
|
| 25 |
- Generating subsequent video content from a user prompt and first frame image.
|
|
@@ -87,17 +87,13 @@ library_name: diffusers
|
|
| 87 |
|
| 88 |
# Quick start
|
| 89 |
|
| 90 |
-
1. **Download the Allegro GitHub code.**
|
| 91 |
|
| 92 |
2. **Install the necessary requirements.**
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
- PyTorch >= 2.4
|
| 96 |
-
- CUDA >= 12.4
|
| 97 |
-
For details, see `requirements.txt`.
|
| 98 |
-
2. It is recommended to use Anaconda to create a new environment (Python >= 3.10) for running the example.
|
| 99 |
|
| 100 |
-
3. **Download the Allegro-TI2V model weights.**
|
| 101 |
|
| 102 |
4. **Run inference.**
|
| 103 |
```bash
|
|
@@ -113,9 +109,7 @@ library_name: diffusers
|
|
| 113 |
--seed 1427329220
|
| 114 |
```
|
| 115 |
|
| 116 |
-
The output video resolution is fixed at
|
| 117 |
-
|
| 118 |
-
### Arguments and Descriptions
|
| 119 |
|
| 120 |
| Argument | Description |
|
| 121 |
|----------------------|---------------------------------------------------------------------------------------------------|
|
|
@@ -124,10 +118,10 @@ library_name: diffusers
|
|
| 124 |
| `--last_frame` | [Optional] If provided, the model will generate intermediate video content based on the specified first and last frame images. |
|
| 125 |
| `--enable_cpu_offload` | [Optional] Offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly. |
|
| 126 |
|
| 127 |
-
|
| 128 |
|
| 129 |
- It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMAVFI) to interpolate the video from 15 FPS to 30 FPS.
|
| 130 |
-
- For better visual quality, you can use
|
| 131 |
|
| 132 |
|
| 133 |
|
|
|
|
| 19 |
|
| 20 |
# Key Feature
|
| 21 |
|
| 22 |
+
- **Open Source**: Full [model weights](https://huggingface.co/rhymes-ai/Allegro-TI2V) and [code](https://github.com/rhymes-ai/Allegro) available to the community, Apache 2.0!
|
| 23 |
- **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
|
| 24 |
- **Text-Image-to-Video Generation**: Generate videos from user-provided prompts and images. Supported input types include:
|
| 25 |
- Generating subsequent video content from a user prompt and first frame image.
|
|
|
|
| 87 |
|
| 88 |
# Quick start
|
| 89 |
|
| 90 |
+
1. **Download the [Allegro GitHub code](https://github.com/rhymes-ai/Allegro).**
|
| 91 |
|
| 92 |
2. **Install the necessary requirements.**
|
| 93 |
+
- Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4. For details, see [requirements.txt](https://github.com/rhymes-ai/Allegro/blob/main/requirements.txt).
|
| 94 |
+
- It is recommended to use Anaconda to create a new environment (Python >= 3.10) to run the following example.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
3. **Download the [Allegro-TI2V model weights](https://huggingface.co/rhymes-ai/Allegro-TI2V).**
|
| 97 |
|
| 98 |
4. **Run inference.**
|
| 99 |
```bash
|
|
|
|
| 109 |
--seed 1427329220
|
| 110 |
```
|
| 111 |
|
| 112 |
+
The output video resolution is fixed at 720 × 1280. Input images with different resolutions will be automatically cropped and resized to fit.
|
|
|
|
|
|
|
| 113 |
|
| 114 |
| Argument | Description |
|
| 115 |
|----------------------|---------------------------------------------------------------------------------------------------|
|
|
|
|
| 118 |
| `--last_frame` | [Optional] If provided, the model will generate intermediate video content based on the specified first and last frame images. |
|
| 119 |
| `--enable_cpu_offload` | [Optional] Offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly. |
|
| 120 |
|
| 121 |
+
5. **(Optional) Interpolate the video to 30 FPS**
|
| 122 |
|
| 123 |
- It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMAVFI) to interpolate the video from 15 FPS to 30 FPS.
|
| 124 |
+
- For better visual quality, you can use imageio to save the video.
|
| 125 |
|
| 126 |
|
| 127 |
|