Reproduction Verification Model

SiamFC-Light model for reproduction verification: given a clean ad tile (source) and a fisheye camera frame (camera), predict whether the ad is correctly and fully broadcast in the camera image.

Used by ad broadcasters to monitor whether ads are being aired as intended โ€” fully visible, unoccluded, and actually displayed (not blacked out).

Model

  • Architecture: SiamFC-Light โ€” Siamese network with global source descriptor matching and learned projection MLPs
  • Matching: Global source descriptor (pooled source features) compared against every camera spatial location via cosine similarity
  • Projections: Two-layer Conv2d(64โ†’64โ†’64) heads (proj_s8, proj_s16) transform backbone features into content-discriminative embeddings
  • Features: 14 spatial statistics per 2 scales (max/mean/top-5/std cosine similarity + max/mean L2 + coverage fraction)
  • Parameters: ~951K total, ~869K trainable
  • Backbone: MobileNetV3-Small, 50% frozen (pretrained on ImageNet)
  • NPU-optimized: all ops in TFLite op support list
  • Input: 540ร—960 (Hร—W), ImageNet-normalized

Match definition

MATCH (positive) โ€” source ad is visible in the camera frame, fully unoccluded and clearly displayed.

NO MATCH (negative) โ€” any of:

  • Source ad is not present in the camera frame
  • Camera screens are blacked out (display off or no signal)
  • Source ad is present but partially or fully occluded

Results

7-fold leave-one-camera-out cross-validation on 54 real fisheye pairs:

Metric Target CV Average Status
FN Rate โ‰ค 5% 28.6% In progress
FP Rate โ‰ค 20% 17.1% PASS
Precision โ‰ฅ 67% 81% PASS

Threshold: 0.35

FN = false alarm. When the model predicts NO MATCH on a correctly broadcast ad, the monitoring system raises a spurious anomaly alert. Keeping FN low is the top priority.

Training

3-stage domain-adapted pipeline:

  1. Stage 1: Panel localization pretraining on synthetic data (IoU=0.945), then domain adaptation on real data with pseudo-labels (IoU=0.868)
  2. Stage 2: Siamese match pretraining on 12K synthetic pairs with panel-aware backbone
  3. Stage 3: Fine-tune on 54 real fisheye pairs with 7-fold leave-one-camera-out CV

Usage

import torch
from huggingface_hub import hf_hub_download
from torchvision import transforms
from PIL import Image

# Load checkpoint
ckpt_path = hf_hub_download("Luigi/reproduction-verification", "model_final_with_threshold.pt")
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
threshold = ckpt["threshold"]  # 0.35

# See the Space demo for the full inline model definition:
# https://huggingface.co/spaces/Luigi/reproduction-verification

Demo

Live demo on HuggingFace Spaces

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Luigi/reproduction-verification 1