LEGION-8B-replicate

Overview

Since the project LEGION: Learning to Ground and Explain for Synthetic Image Detection open-sourced its code repository but did not provide pre-trained weights, we replicated the model by referring to the open-source code and the paper, and are now releasing our replicated weights.

Due to potential discrepancies in the replication process, the released weights may achieve lower scores than officially reported results on certain benchmarks.

Training Details

We conducted training on 4x A100 40G GPUs.

For the first training stage, the official configuration uses 8 GPUs with a global batch size of 16 (batch size per device = 2). To maintain the same global batch size, we used 4 GPUs with a per-device batch size of 4.

For the second training stage, the official configuration uses 8 GPUs with a global batch size of 512 (batch size per device = 64). We used 4 GPUs with a per-device batch size of 8 and a gradient accumulation step of 16. This results in an effective per-device batch size of 128, maintaining an equivalent global batch size of 512.

Inference Usage

A simple inference script is provided at infer.py.

Usage instructions are as follows:

cp infer.py /path/to/LEGION
python infer.py --model_path /path/to/LEGION-8B-replicate --image_root /path/to/images --save_root /path/to/results

Examples

Upon examining the image. I have found: A cat sits on a rooftop at sunset, with its right front paw missing and the left front paw appearing deformed. To elaborate, I have found the following artifacts. Cat's right front paw :The cat's right front paw is missing. Cat's left front paw :The cat's left front paw is deformed.

Performance

Due to the evaluation and metric-related code not being open-sourced, the test results may be inaccurate. The IoU evaluation metric for masks may be affected by mask processing during inference, resulting in lower scores.

Localization

Method	SynthScars		LOKI		RichHF-18K
Method	mIoU	F1	mIoU	F1	mIoU	F1
HiFi-Net	45.65	0.57	39.60	2.41	44.96	0.39
TruFor	48.60	15.29	46.55	16.70	48.41	18.03
PAL4VST	56.10	29.21	47.34	11.58	49.88	14.78
Ferret	27.09	15.24	24.50	18.88	26.52	16.22
Griffon	27.68	16.67	21.96	20.41	28.13	18.19
LISA-v1-7B	34.51	18.77	31.10	9.29	35.90	21.94
InternVL2-8B	41.25	6.39	42.03	10.06	39.90	9.58
Qwen2-VL-72B	30.20	17.50	26.62	20.99	27.58	19.02
LEGION (Official)	58.13	34.54	48.66	16.71	50.07	17.41
LEGION (Replicate)	23.92	33.47	-	-	-	-

Explanation

Method	Params	SynthScars		LOKI
Method	Params	ROUGE-L ↑	CSS ↑	ROUGE-L ↑	CSS ↑
Qwen2-VL	72B	25.84	58.15	11.80	37.64
LLaVA-v1.6	7B	29.61	61.75	16.07	41.07
InternVL2	8B	25.93	56.89	10.10	39.62
Deepseek-VL2	27B	25.50	47.77	6.70	28.76
GPT-4o	-	22.43	53.55	9.61	38.98
LEGION (Official)	8B	39.50	72.60	18.55	45.96
LEGION (Replicate)	8B	50.57	-	-	-

Detection

Method	GANs	Deepfakes	Perceptual Loss		Low Level Vision		Diffusion
Method	GANs	Deepfakes	CRN	IMLE	SITD	SAN	Diffusion
Co-occurence	75.17	59.14	73.06	87.21	68.98	60.42	85.53
Freq-spec	75.28	45.18	53.61	50.98	47.46	57.12	69.00
CNNSpot	85.29	53.47	86.31	86.26	66.67	48.69	58.63
Patchfor	69.97	75.54	72.33	55.30	75.14	75.28	72.54
UniFD	95.25	66.60	59.50	72.00	63.00	57.50	82.02
LDGard	89.17	58.00	50.74	50.78	62.50	50.00	89.79
FreqNet	94.23	97.40	71.92	67.35	88.92	59.04	83.34
NPR	94.16	76.89	50.00	50.00	66.94	98.63	94.54
LEGION (Official)	97.01	63.37	90.78	98.93	79.44	57.76	83.10
LEGION (Replicate)	91.48	79.16	84.73	96.71	78.06	53.70	-

Acknowledgements

Thanks to Gennadiyev for providing computational resources and moral support, and for helping me complete the reproduction.

Thanks to draw-your-dream/LEGION for fixing bugs in the first-stage training.

Downloads last month: 10

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support