MC3-18 for UCF-101 Action Recognition
Model Summary
This model is an MC3-18 (Mixed 3D Convolutions) network fine-tuned on the UCF-101 dataset for human action recognition. The architecture combines 2D and 3D convolutions, delivering an efficient temporal-spatial representation while maintaining a lightweight parameter count.
- Architecture: MC3-18 (3D CNN with mixed convolutions)
- Pretraining: Kinetics-400
- Parameter Count: ~11.7M
- Input Format: 16-frame clips, 112ร112 spatial resolution
- Number of Classes: 101
Intended Use
Primary use case: Action classification in short, trimmed videos similar in distribution to UCF-101.
Users: Researchers, practitioners, and engineers working on video-understanding pipelines.
Tasks:
- Action recognition
- Clip-level human activity tagging
- Baseline modeling for low-compute video applications
Not suitable for long-horizon temporal reasoning or untrimmed video detection without adaptation.
Performance
Quantitative Results (UCF-101 Split 1, Test Set)
| Metric | Value |
|---|---|
| Accuracy | 87.05% |
| F1 Score | 0.857 |
| Precision | 0.868 |
Comparison to Published Baseline
- Original MC3-18 (Kinetics-400 โ UCF-101): 85.0%
- This model: 87.05% (+2.05%)
How to Use
Inference Example (PyTorch)
import torch
# Load from HuggingFace
from huggingface_hub import hf_hub_download
from torchvision.transforms import Compose, Resize, CenterCrop, Normalize, ToTensor
model_path = hf_hub_download(repo_id="dronefreak/mc3-18-ucf101", filename="mc318-ufc101-split-1.pth")
model = torch.load(model_path)
# Prepare video (16 frames, CรTรHรW)
transform = Compose([
Resize((128, 171)),
CenterCrop(112),
ToTensor(),
Normalize(mean=[0.43216, 0.394666, 0.37645],
std=[0.22803, 0.22145, 0.216989])
])
# Inference
with torch.no_grad():
output = model(video_tensor)
prediction = output.argmax(dim=1)
Training
- Dataset: UCF-101 Split 1 (9,537 train / 3,783 test videos)
- Epochs: 200
- Batch Size: 32
- Optimizer: SGD (lr=0.001, momentum=0.9, weight_decay=1e-4)
- Augmentation: ColorJitter, RandomHorizontalFlip, RandomCrop
Limitations
- Trained only on UCF-101 (limited to 101 action classes)
- Requires 16-frame clips (not suitable for real-time single-frame)
- Best performance on similar action types to UCF-101
Citation
@misc{mc3_18_ucf101,
author = {Saumya Saksena},
title = {MC3-18 for UCF-101 Action Recognition},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/dronefreak/mc3-18-ucf101}}
}
License
Apache-2.0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Evaluation results
- Top-1 Accuracy on UCF-101test set self-reported87.050
- F1 Score on UCF-101test set self-reported85.690
