Marine1: Underwater Acoustic Classifier ππ
β οΈ IMPORTANT: Use Safetensors Format
This repository contains both.pth(pickle) and.safetensorsformats.
Please use the.safetensorsfiles to avoid pickle security vulnerabilities.
See How to Use section below for examples.
Model Description
Marine1 is a state-of-the-art deep learning model for classifying underwater acoustic events. Built on a fine-tuned ResNet18 architecture, it achieves 98.33% accuracy in distinguishing between four categories of marine soundscapes.
This model is designed for marine biologists, oceanographers, researchers, and conservationists working with underwater audio data.
Model Details
- Model Type: Audio Classification (CNN-based)
- Architecture: Fine-tuned ResNet18 (transfer learning from ImageNet)
- Input: Audio files (WAV, MP3, FLAC) - max 10 seconds
- Output: 4-class classification with confidence scores
- Framework: PyTorch 2.0+
- Parameters: ~11M parameters
- Training Time: ~10 minutes (4 epochs)
- Format: Available in both safetensors (recommended) and PyTorch formats
π Security Note
This model is available in safetensors format, which is the recommended secure format that avoids pickle vulnerabilities.
Why Safetensors?
- β No arbitrary code execution risks - Safe to load from any source
- β Fast loading times - Optimized performance
- β Memory-efficient - Better resource usage
- β Cross-platform compatibility - Works everywhere
β οΈ The .pth files are provided for backward compatibility only.
Always prefer .safetensors for production use.
Learn more: Hugging Face Safetensors Documentation
Categories
The model classifies underwater sounds into four distinct categories:
π Marine Animals (
marine_animal)- Whales, dolphins, orcas, and other marine mammals
- Fish sounds and biological vocalizations
π’ Vessels (
vessel)- Ships, boats, submarines
- Maritime traffic and propeller sounds
π Natural Sounds (
natural_sound)- Ocean waves, water movement
- Bubbles, rain, and environmental sounds
π§ Other Anthropogenic (
other_anthropogenic)- Human-made sounds (non-vessel)
- Underwater construction, sonar, etc.
Performance
| Metric | Value |
|---|---|
| Accuracy | 98.33% |
| Balanced Accuracy | 91.67% |
| Training Epochs | 4 |
| Training Time | ~10 minutes |
Per-Class Performance
- Marine Animals: Excellent (high precision/recall)
- Vessels: Excellent (high precision/recall)
- Natural Sounds: Good (limited training data)
- Other Anthropogenic: Good (limited training data)
How to Use
Installation
pip install torch torchaudio librosa numpy safetensors
Quick Start (Recommended - Safetensors)
import torch
import librosa
import numpy as np
from safetensors.torch import load_file
# Load model (secure format)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
state_dict = load_file("best_model_finetuned.safetensors", device=str(device))
# Load and process audio
audio_path = "underwater_sound.wav"
y, sr = librosa.load(audio_path, sr=16000, duration=10.0)
# Create mel spectrogram
mel_spec = librosa.feature.melspectrogram(
y=y, sr=sr, n_mels=128, n_fft=2048,
hop_length=512, fmax=8000
)
log_mel_spec = librosa.power_to_db(mel_spec, ref=np.max)
# Prepare input
input_tensor = torch.FloatTensor(log_mel_spec).unsqueeze(0).unsqueeze(0).to(device)
# Predict
model.eval()
with torch.no_grad():
outputs = model(input_tensor)
probabilities = torch.nn.functional.softmax(outputs, dim=1)[0]
predicted_class = probabilities.argmax().item()
confidence = probabilities[predicted_class].item()
# Map to class names
class_names = ["vessel", "marine_animal", "natural_sound", "other_anthropogenic"]
print(f"Prediction: {class_names[predicted_class]} ({confidence*100:.2f}%)")
Using the Inference Class (Easiest)
from huggingface_hub import hf_hub_download
from inference import Marine1Classifier
# Download model (safetensors format - secure!)
model_path = hf_hub_download(
repo_id="shiv207/Marine1",
filename="best_model_finetuned.safetensors"
)
# Initialize classifier
classifier = Marine1Classifier(model_path)
# Make prediction
result = classifier.predict("underwater_sound.wav")
print(f"Prediction: {result['predicted_class']}")
print(f"Confidence: {result['confidence']*100:.2f}%")
Using the Complete Pipeline
For a full-featured implementation with preprocessing and JSON output:
# Clone the repository
git clone https://github.com/shiv207/underwater-audio-classifier.git
cd underwater-audio-classifier
# Install dependencies
pip install -r requirements.txt
# Run prediction (supports both .pth and .safetensors)
python predict_minimal.py --audio your_audio.wav --model models/best_model_finetuned.safetensors
# Generate UDA-compliant JSON
python generate_json.py --audio your_audio.wav --output result.json
Streamlit Web Interface
streamlit run app.py
Features:
- Drag-and-drop multiple audio files
- Batch processing with progress tracking
- Visual spectrograms and probability charts
- Export results as JSON (individual or batch)
- Event detection mode with temporal localization
Training Data
The model was trained on a diverse dataset of underwater acoustic recordings:
- Marine Animals: Whale songs, dolphin clicks, orca vocalizations
- Vessels: Various ship types (cargo, tanker, passenger, tug)
- Natural Sounds: Ocean waves, water movement, environmental sounds
- Other Sounds: Anthropogenic non-vessel sounds
Note: Natural sounds and other anthropogenic categories have limited training samples (13 each), which may affect performance on edge cases.
Technical Details
Audio Processing Pipeline
- Resampling: Audio resampled to 16kHz
- Duration: Truncated/padded to 10 seconds
- Spectrogram: 128-band mel spectrogram (n_fft=2048, hop_length=512)
- Normalization: Log-scale conversion with dB normalization
- Augmentation (training only): Speed, pitch, noise injection
Model Architecture
ResNet18 (Pre-trained on ImageNet)
βββ Conv1 (frozen)
βββ Layer1 (frozen)
βββ Layer2 (fine-tuned)
βββ Layer3 (fine-tuned)
βββ Layer4 (fine-tuned)
βββ FC Layer (4 classes, trained from scratch)
Training Configuration
- Optimizer: Adam (lr=0.0001)
- Loss: CrossEntropyLoss
- Batch Size: 8
- Epochs: 4
- Data Split: 80% train, 20% validation
- Device: CPU/MPS (Apple Silicon compatible)
Limitations
- Limited Data: Natural sounds and other anthropogenic categories have fewer training samples
- Duration: Optimized for 10-second audio clips
- Sample Rate: Best performance at 16kHz
- Domain: Trained on specific underwater environments; may need fine-tuning for different acoustic conditions
- Background Noise: Performance may degrade with high noise levels
Intended Use
Primary Use Cases
- Marine biology research and species monitoring
- Vessel traffic analysis and maritime surveillance
- Environmental impact assessment
- Underwater acoustic event detection
- Marine conservation and ecosystem monitoring
Out-of-Scope Use
- Real-time streaming (model requires preprocessing)
- Non-underwater audio classification
- Medical or security applications
- High-stakes decision making without human oversight
Ethical Considerations
- This model is intended for research and conservation purposes
- Should not be used as the sole basis for regulatory or enforcement decisions
- May have biases based on training data distribution
- Performance varies across different marine environments
- Users should validate results in their specific context
Citation
If you use this model in your research, please cite:
@misc{marine1-underwater-classifier,
author = {Shivam Sharma},
title = {Marine1: Underwater Acoustic Classifier},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/shiv207/Marine1}},
}
Model Variants
This repository includes multiple model formats:
Safetensors Format (π Recommended - Secure)
best_model_finetuned.safetensors β
- Fine-tuned ResNet18
- 98.33% accuracy
- Secure format (no pickle vulnerabilities)
- Best overall performance
best_model_simple.safetensors
- Custom CNN trained from scratch
- 93% accuracy
- Lighter weight alternative
- Secure format
Legacy Formats (β οΈ Not Recommended)
best_model_finetuned.pth
- PyTorch pickle format (legacy)
- β οΈ Contains pickle security warnings
- Use safetensors version instead
best_model_simple.pth
- PyTorch pickle format (legacy)
- β οΈ Contains pickle security warnings
- Use safetensors version instead
Marine 1.mlmodel
- CoreML format for iOS/macOS deployment
- Optimized for Apple devices
Additional Resources
- GitHub Repository: underwater-audio-classifier
- Demo Application: Streamlit web interface included
- Documentation: Comprehensive README and code comments
- JSON Export: Grand Challenge UDA format compatible
License
MIT License - Free for commercial and research use
Acknowledgments
- Built with PyTorch and Streamlit
- ResNet18 architecture from torchvision
- Audio processing with librosa
- Inspired by marine conservation efforts
Contact
For questions, issues, or collaboration opportunities:
- GitHub: @shiv207
- Repository Issues: Report a bug
Model Card Authors: Shivam Sharma
Last Updated: October 2024
- Downloads last month
- 24
Evaluation results
- Accuracyself-reported98.330
- Balanced Accuracyself-reported91.670