๐Ÿ™ GitHub ๐Ÿ“„ Paper: MC3 ๐Ÿ’ฝ Dataset: UCF-101 โš–๏ธ License: Apache 2.0

Demo

MC3-18 for UCF-101 Action Recognition

Model Summary

This model is an MC3-18 (Mixed 3D Convolutions) network fine-tuned on the UCF-101 dataset for human action recognition. The architecture combines 2D and 3D convolutions, delivering an efficient temporal-spatial representation while maintaining a lightweight parameter count.

  • Architecture: MC3-18 (3D CNN with mixed convolutions)
  • Pretraining: Kinetics-400
  • Parameter Count: ~11.7M
  • Input Format: 16-frame clips, 112ร—112 spatial resolution
  • Number of Classes: 101

Intended Use

Primary use case: Action classification in short, trimmed videos similar in distribution to UCF-101.
Users: Researchers, practitioners, and engineers working on video-understanding pipelines.
Tasks:

  • Action recognition
  • Clip-level human activity tagging
  • Baseline modeling for low-compute video applications

Not suitable for long-horizon temporal reasoning or untrimmed video detection without adaptation.


Performance

Quantitative Results (UCF-101 Split 1, Test Set)

Metric Value
Accuracy 87.05%
F1 Score 0.857
Precision 0.868

Comparison to Published Baseline

  • Original MC3-18 (Kinetics-400 โ†’ UCF-101): 85.0%
  • This model: 87.05% (+2.05%)

How to Use

Inference Example (PyTorch)

import torch
# Load from HuggingFace
from huggingface_hub import hf_hub_download
from torchvision.transforms import Compose, Resize, CenterCrop, Normalize, ToTensor
model_path = hf_hub_download(repo_id="dronefreak/mc3-18-ucf101", filename="mc318-ufc101-split-1.pth")
model = torch.load(model_path)

# Prepare video (16 frames, Cร—Tร—Hร—W)
transform = Compose([
    Resize((128, 171)),
    CenterCrop(112),
    ToTensor(),
    Normalize(mean=[0.43216, 0.394666, 0.37645], 
              std=[0.22803, 0.22145, 0.216989])
])

# Inference
with torch.no_grad():
    output = model(video_tensor)
    prediction = output.argmax(dim=1)

Training

  • Dataset: UCF-101 Split 1 (9,537 train / 3,783 test videos)
  • Epochs: 200
  • Batch Size: 32
  • Optimizer: SGD (lr=0.001, momentum=0.9, weight_decay=1e-4)
  • Augmentation: ColorJitter, RandomHorizontalFlip, RandomCrop

Limitations

  • Trained only on UCF-101 (limited to 101 action classes)
  • Requires 16-frame clips (not suitable for real-time single-frame)
  • Best performance on similar action types to UCF-101

Citation

@misc{mc3_18_ucf101,
  author = {Saumya Saksena},
  title = {MC3-18 for UCF-101 Action Recognition},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/dronefreak/mc3-18-ucf101}}
}

License

Apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results