Reproduction Verification Model

SiamFC-Light model for reproduction verification: given a clean ad tile (source) and a fisheye camera frame (camera), predict whether the ad is correctly and fully broadcast in the camera image.

Used by ad broadcasters to monitor whether ads are being aired as intended — fully visible, unoccluded, and actually displayed (not blacked out).

Model

Architecture: SiamFC-Light — Siamese network with global source descriptor matching and learned projection MLPs
Matching: Global source descriptor (pooled source features) compared against every camera spatial location via cosine similarity
Projections: Two-layer Conv2d(64→64→64) heads (proj_s8, proj_s16) transform backbone features into content-discriminative embeddings
Features: 14 spatial statistics per 2 scales (max/mean/top-5/std cosine similarity + max/mean L2 + coverage fraction)
Parameters: ~951K total, ~869K trainable
Backbone: MobileNetV3-Small, 50% frozen (pretrained on ImageNet)
NPU-optimized: all ops in TFLite op support list
Input: 540×960 (H×W), ImageNet-normalized

Match definition

MATCH (positive) — source ad is visible in the camera frame, fully unoccluded and clearly displayed.

NO MATCH (negative) — any of:

Source ad is not present in the camera frame
Camera screens are blacked out (display off or no signal)
Source ad is present but partially or fully occluded

Results

7-fold leave-one-camera-out cross-validation on 54 real fisheye pairs:

Metric	Target	CV Average	Status
FN Rate	≤ 5%	28.6%	In progress
FP Rate	≤ 20%	17.1%	PASS
Precision	≥ 67%	81%	PASS

Threshold: 0.35

FN = false alarm. When the model predicts NO MATCH on a correctly broadcast ad, the monitoring system raises a spurious anomaly alert. Keeping FN low is the top priority.

Training

3-stage domain-adapted pipeline:

Stage 1: Panel localization pretraining on synthetic data (IoU=0.945), then domain adaptation on real data with pseudo-labels (IoU=0.868)
Stage 2: Siamese match pretraining on 12K synthetic pairs with panel-aware backbone
Stage 3: Fine-tune on 54 real fisheye pairs with 7-fold leave-one-camera-out CV

Usage

import torch
from huggingface_hub import hf_hub_download
from torchvision import transforms
from PIL import Image

# Load checkpoint
ckpt_path = hf_hub_download("Luigi/reproduction-verification", "model_final_with_threshold.pt")
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
threshold = ckpt["threshold"]  # 0.35

# See the Space demo for the full inline model definition:
# https://huggingface.co/spaces/Luigi/reproduction-verification

Demo

Live demo on HuggingFace Spaces

Downloads last month: -; Downloads are not tracked for this model. How to track

Luigi
/

reproduction-verification