Reproduction Verification Model
SiamFC-Light model for reproduction verification: given a clean ad tile (source)
and a fisheye camera frame (camera), predict whether the ad is correctly and fully
broadcast in the camera image.
Used by ad broadcasters to monitor whether ads are being aired as intended โ fully visible, unoccluded, and actually displayed (not blacked out).
Model
- Architecture: SiamFC-Light โ Siamese network with global source descriptor matching and learned projection MLPs
- Matching: Global source descriptor (pooled source features) compared against every camera spatial location via cosine similarity
- Projections: Two-layer Conv2d(64โ64โ64) heads (
proj_s8,proj_s16) transform backbone features into content-discriminative embeddings - Features: 14 spatial statistics per 2 scales (max/mean/top-5/std cosine similarity + max/mean L2 + coverage fraction)
- Parameters: ~951K total, ~869K trainable
- Backbone: MobileNetV3-Small, 50% frozen (pretrained on ImageNet)
- NPU-optimized: all ops in TFLite op support list
- Input: 540ร960 (HรW), ImageNet-normalized
Match definition
MATCH (positive) โ source ad is visible in the camera frame, fully unoccluded and clearly displayed.
NO MATCH (negative) โ any of:
- Source ad is not present in the camera frame
- Camera screens are blacked out (display off or no signal)
- Source ad is present but partially or fully occluded
Results
7-fold leave-one-camera-out cross-validation on 54 real fisheye pairs:
| Metric | Target | CV Average | Status |
|---|---|---|---|
| FN Rate | โค 5% | 28.6% | In progress |
| FP Rate | โค 20% | 17.1% | PASS |
| Precision | โฅ 67% | 81% | PASS |
Threshold: 0.35
FN = false alarm. When the model predicts NO MATCH on a correctly broadcast ad, the monitoring system raises a spurious anomaly alert. Keeping FN low is the top priority.
Training
3-stage domain-adapted pipeline:
- Stage 1: Panel localization pretraining on synthetic data (IoU=0.945), then domain adaptation on real data with pseudo-labels (IoU=0.868)
- Stage 2: Siamese match pretraining on 12K synthetic pairs with panel-aware backbone
- Stage 3: Fine-tune on 54 real fisheye pairs with 7-fold leave-one-camera-out CV
Usage
import torch
from huggingface_hub import hf_hub_download
from torchvision import transforms
from PIL import Image
# Load checkpoint
ckpt_path = hf_hub_download("Luigi/reproduction-verification", "model_final_with_threshold.pt")
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
threshold = ckpt["threshold"] # 0.35
# See the Space demo for the full inline model definition:
# https://huggingface.co/spaces/Luigi/reproduction-verification