| | --- |
| | license: mit |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen-Image-Edit-2509 |
| | base_model_relation: adapter |
| | --- |
| | |
| | <p align="center"> |
| | <img src="./MotionEdit.png" width="500"/> |
| | <p> |
| | |
| | # MotionEdit: Benchmarking and Learning Motion-Centric Image Editing |
| | |
| | [](https://motion-edit.github.io/) |
| | [](https://github.com/elainew728/motion-edit/tree/main) |
| | [](https://huggingface.co/datasets/elaine1wan/MotionEdit-Bench) |
| | [](https://x.com/yixin_wan_?s=21&t=EqTxUZPAldbQnbhLN-CETA) |
| | [](https://motion-edit.github.io/) <br> |
| |
|
| |
|
| | # ✨ Overview |
| | **MotionEdit** is a novel dataset and benchmark for motion-centric image editing. We also propose **MotionNFT** (Motion-guided Negative-aware FineTuning), a post-training framework with motion alignment rewards to guide models on motion image editing task. |
| |
|
| | ### Model Description |
| | - **Model type:** Image Editing |
| | - **Language(s):** English |
| | - **Finetuned from model [optional]:** Qwen/Qwen-Image-Edit-2509 |
| |
|
| | ### Model Sources [optional] |
| | - **Repository:** https://github.com/elainew728/motion-edit/tree/main |
| | - **Paper:** https://arxiv.org/abs/2512.10284 |
| | - **Demo Page:** https://motion-edit.github.io/ |
| |
|
| | # 🔧 Usage |
| | ## 🧱 To Start: Environment Setup |
| | Clone our github repository and switch to the directory. |
| |
|
| | ``` |
| | git clone https://github.com/elainew728/motion-edit.git |
| | cd motion-edit |
| | ``` |
| |
|
| | Create and activate the conda environment with dependencies that supports inference and training. |
| |
|
| | > * **Note:** some models like UltraEdit requires specific dependencies on the diffusers library. Please refer to their official repository to resolve dependencies before running inference. |
| |
|
| | ``` |
| | conda env create -f environment.yml |
| | conda activate motionedit |
| | ``` |
| | Finally, configure your own huggingface token to access restricted models by modifying `YOUR_HF_TOKEN_HERE` in [inference/run_image_editing.py](https://github.com/elainew728/motion-edit/tree/main/inference/run_image_editing.py). |
| |
|
| |
|
| | ## 🔍 Inferencing on *MotionEdit-Bench* with Image Editing Models |
| | We have released our [MotionEdit-Bench](https://huggingface.co/datasets/elaine1wan/MotionEdit-Bench) on Huggingface. |
| | In this Github Repository, we provide code that supports easy inference across open-source Image Editing models: ***Qwen-Image-Edit***, ***Flux.1 Kontext [Dev]***, ***InstructPix2Pix***, ***HQ-Edit***, ***Step1X-Edit***, ***UltraEdit***, ***MagicBrush***, and ***AnyEdit***. |
| |
|
| |
|
| | ### Step 1: Data Preparation |
| | The inference script default to using our [MotionEdit-Bench](https://huggingface.co/datasets/elaine1wan/MotionEdit-Bench), which will download the dataset from Huggingface. You can specify a `cache_dir` for storing the cached data. |
| |
|
| | Additionally, you can construct your own dataset for inference. Please organize all input images into a folder `INPUT_FOLDER` and create a `metadata.jsonl` in the same directory. The `metadata.jsonl` file **must** at least contain entries with 2 entries: |
| | ``` |
| | { |
| | "file_name": IMAGE_NAME.EXT, |
| | "prompt": PROMPT |
| | } |
| | ``` |
| |
|
| | Then, load your dataset by: |
| | ``` |
| | from datasets import load_dataset |
| | dataset = load_dataset("imagefolder", data_dir=INPUT_FOLDER) |
| | ``` |
| |
|
| | ### Step 2: Running Inference |
| | Use the following command to run inference on **MotionEdit-Bench** with our ***MotionNFT*** Huggingface checkpoint, trained on **MotionEdit** with Qwen-Image-Edit as the base model: |
| | ``` |
| | python inference/run_image_editing.py \ |
| | -o "./outputs/" \ |
| | -m "motionedit" \ |
| | --seed 42 |
| | ``` |
| |
|
| | <!-- ## Authors |
| | [Yixin Wan](https://elainew728.github.io/)<sup>1,2</sup>, [Lei Ke](https://www.kelei.site/)<sup>1</sup>, [Wenhao Yu](https://wyu97.github.io/)<sup>1</sup>, [Kai-Wei Chang](https://web.cs.ucla.edu/~kwchang/)<sup>2</sup>, [Dong Yu](https://sites.google.com/view/dongyu888/)<sup>1</sup> |
| |
|
| | <sup>1</sup>Tencent AI, Seattle <sup>2</sup>University of California, Los Angeles |
| | --> |
| |
|
| | # ✏️ Citing |
| |
|
| | ```bibtex |
| | @misc{wan2025motioneditbenchmarkinglearningmotioncentric, |
| | title={MotionEdit: Benchmarking and Learning Motion-Centric Image Editing}, |
| | author={Yixin Wan and Lei Ke and Wenhao Yu and Kai-Wei Chang and Dong Yu}, |
| | year={2025}, |
| | eprint={2512.10284}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV}, |
| | url={https://arxiv.org/abs/2512.10284}, |
| | } |
| | ``` |