LEGION-8B-replicate

Overview

Since the project LEGION: Learning to Ground and Explain for Synthetic Image Detection open-sourced its code repository but did not provide pre-trained weights, we replicated the model by referring to the open-source code and the paper, and are now releasing our replicated weights.

Due to potential discrepancies in the replication process, the released weights may achieve lower scores than officially reported results on certain benchmarks.

Training Details

We conducted training on 4x A100 40G GPUs.

For the first training stage, the official configuration uses 8 GPUs with a global batch size of 16 (batch size per device = 2). To maintain the same global batch size, we used 4 GPUs with a per-device batch size of 4.

For the second training stage, the official configuration uses 8 GPUs with a global batch size of 512 (batch size per device = 64). We used 4 GPUs with a per-device batch size of 8 and a gradient accumulation step of 16. This results in an effective per-device batch size of 128, maintaining an equivalent global batch size of 512.

Inference Usage

A simple inference script is provided at infer.py.

Usage instructions are as follows:

cp infer.py /path/to/LEGION
python infer.py --model_path /path/to/LEGION-8B-replicate --image_root /path/to/images --save_root /path/to/results

Examples

Original Image Mask generated by LEGION-8B-replicate

Upon examining the image. I have found: A cat sits on a rooftop at sunset, with its right front paw missing and the left front paw appearing deformed. To elaborate, I have found the following artifacts. Cat's right front paw :The cat's right front paw is missing. Cat's left front paw :The cat's left front paw is deformed.

Performance

Due to the evaluation and metric-related code not being open-sourced, the test results may be inaccurate. The IoU evaluation metric for masks may be affected by mask processing during inference, resulting in lower scores.

Localization

Method SynthScars LOKI RichHF-18K
mIoU F1 mIoU F1 mIoU F1
HiFi-Net 45.65 0.57 39.60 2.41 44.96 0.39
TruFor 48.60 15.29 46.55 16.70 48.41 18.03
PAL4VST 56.10 29.21 47.34 11.58 49.88 14.78
Ferret 27.09 15.24 24.50 18.88 26.52 16.22
Griffon 27.68 16.67 21.96 20.41 28.13 18.19
LISA-v1-7B 34.51 18.77 31.10 9.29 35.90 21.94
InternVL2-8B 41.25 6.39 42.03 10.06 39.90 9.58
Qwen2-VL-72B 30.20 17.50 26.62 20.99 27.58 19.02
LEGION (Official) 58.13 34.54 48.66 16.71 50.07 17.41
LEGION (Replicate) 23.92 33.47 - - - -

Explanation

Method Params SynthScars LOKI
ROUGE-L ↑ CSS ↑ ROUGE-L ↑ CSS ↑
Qwen2-VL 72B 25.84 58.15 11.80 37.64
LLaVA-v1.6 7B 29.61 61.75 16.07 41.07
InternVL2 8B 25.93 56.89 10.10 39.62
Deepseek-VL2 27B 25.50 47.77 6.70 28.76
GPT-4o - 22.43 53.55 9.61 38.98
LEGION (Official) 8B 39.50 72.60 18.55 45.96
LEGION (Replicate) 8B 50.57 - - -

Detection

Method GANs Deepfakes Perceptual Loss Low Level Vision Diffusion
CRN IMLE SITD SAN
Co-occurence 75.17 59.14 73.06 87.21 68.98 60.42 85.53
Freq-spec 75.28 45.18 53.61 50.98 47.46 57.12 69.00
CNNSpot 85.29 53.47 86.31 86.26 66.67 48.69 58.63
Patchfor 69.97 75.54 72.33 55.30 75.14 75.28 72.54
UniFD 95.25 66.60 59.50 72.00 63.00 57.50 82.02
LDGard 89.17 58.00 50.74 50.78 62.50 50.00 89.79
FreqNet 94.23 97.40 71.92 67.35 88.92 59.04 83.34
NPR 94.16 76.89 50.00 50.00 66.94 98.63 94.54
LEGION (Official) 97.01 63.37 90.78 98.93 79.44 57.76 83.10
LEGION (Replicate) 91.48 79.16 84.73 96.71 78.06 53.70 -

Acknowledgements

Thanks to Gennadiyev for providing computational resources and moral support, and for helping me complete the reproduction.

Thanks to draw-your-dream/LEGION for fixing bugs in the first-stage training.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support