rhymes-ai
/

Allegro-TI2V

@@ -19,7 +19,7 @@ library_name: diffusers
 # Key Feature
-- **Open Source**: Full [model weights](https://huggingface.co/rhymes-ai/Allegro) and [code](https://github.com/rhymes-ai/Allegro) available to the community, Apache 2.0!
 - **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
 - **Text-Image-to-Video Generation**: Generate videos from user-provided prompts and images. Supported input types include:
   - Generating subsequent video content from a user prompt and first frame image.
@@ -87,17 +87,13 @@ library_name: diffusers
 # Quick start
-1. **Download the Allegro GitHub code.**
 2. **Install the necessary requirements.**
-   1. Ensure the following dependencies are met:
-      - Python >= 3.10
-      - PyTorch >= 2.4
-      - CUDA >= 12.4
-      For details, see `requirements.txt`.
-   2. It is recommended to use Anaconda to create a new environment (Python >= 3.10) for running the example.
-3. **Download the Allegro-TI2V model weights.**
 4. **Run inference.**
    ```bash
@@ -113,9 +109,7 @@ library_name: diffusers
    --seed 1427329220
    ```
-   The output video resolution is fixed at **720 × 1280**. Input images with different resolutions will be automatically cropped and resized to fit.
-### Arguments and Descriptions
 | Argument             | Description                                                                                       |
 |----------------------|---------------------------------------------------------------------------------------------------|
@@ -124,10 +118,10 @@ library_name: diffusers
 | `--last_frame`       | [Optional] If provided, the model will generate intermediate video content based on the specified first and last frame images. |
 | `--enable_cpu_offload` | [Optional] Offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly. |
-### (Optional) Interpolate the video to 30 FPS
 - It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMAVFI) to interpolate the video from 15 FPS to 30 FPS.
-- For better visual quality, you can use `imageio` to save the video.

 # Key Feature
+- **Open Source**: Full [model weights](https://huggingface.co/rhymes-ai/Allegro-TI2V) and [code](https://github.com/rhymes-ai/Allegro) available to the community, Apache 2.0!
 - **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
 - **Text-Image-to-Video Generation**: Generate videos from user-provided prompts and images. Supported input types include:
   - Generating subsequent video content from a user prompt and first frame image.
 # Quick start
+1. **Download the [Allegro GitHub code](https://github.com/rhymes-ai/Allegro).**
 2. **Install the necessary requirements.**
+   - Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4. For details, see [requirements.txt](https://github.com/rhymes-ai/Allegro/blob/main/requirements.txt).
+   - It is recommended to use Anaconda to create a new environment (Python >= 3.10) to run the following example.
+3. **Download the [Allegro-TI2V model weights](https://huggingface.co/rhymes-ai/Allegro-TI2V).**
 4. **Run inference.**
    ```bash
    --seed 1427329220
    ```
+   The output video resolution is fixed at 720 × 1280. Input images with different resolutions will be automatically cropped and resized to fit.
 | Argument             | Description                                                                                       |
 |----------------------|---------------------------------------------------------------------------------------------------|
 | `--last_frame`       | [Optional] If provided, the model will generate intermediate video content based on the specified first and last frame images. |
 | `--enable_cpu_offload` | [Optional] Offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly. |
+5. **(Optional) Interpolate the video to 30 FPS**
 - It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMAVFI) to interpolate the video from 15 FPS to 30 FPS.
+- For better visual quality, you can use imageio to save the video.